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1  Introduction 

1.1  The  User’s  Guide 

The  PUNDIT  User’s  Guide  is  intended  to  provide  a  concise  and  general  introduction  to  the 
facilities  of  the  pundit  text-processing  system.  The  intended  audience  is  computational 
linguists  familiar  with  Quintus  Prolog.  While  this  document  is  not  a  reference  manual, 
amd  does  not  in  itself  contain  sufhcient  information  for  you  to  either  extend  the  system  or 
port  it  to  a  new  domain,  we  have  tried  to  cover  the  operational  basics:  how  to  run  PUNDIT 
(Section  2)  and  how  to  interpret  pundit’s  output  (Section  3).  In  addition,  Section  4 
documents  the  two  main  procedures  for  accessing  the  system  (parse  and  pimdit),  as  wel’ 
as  a  number  of  other  procedures  which  we  make  frequent  use  of  as  developers.  Appendix 
A  and  Appendix  B  will  help  you  set  the  system  up.  Appendix  D  identifies  the  core  and 
domain  files,  and  Appendix  E  lists  papers,  presentations,  and  technical  documentation 
available  for  PUNDIT. 

1.2  The  Software 

The  User’s  Guide  is  designed  to  accompany  a  subset  of  the  text-understanding  software 
which  has  been  developed  at  the  Paoli  Research  Center,  as  it  exists  on  the  date  of  pub¬ 
lication:  the  core  components  of  PUNDIT,  together  with  the  domain-specific  components 
developed  to  process  Navy  tactical  messages  (rainforms).  This  domain  will  be  referred 
to  henceforth  as  the  MUCK  domain  (an  acronym  for  the  message  understanding  confer¬ 
ence  which  occasioned  the  development  of  the  software).  The  MUCK  software  is  essontially 
similar  to  that  developed  for  other  domains,  and  may  be  considered  representative:  it 
includes  a  domain-specific  message  input  screen,  lexicon,  knowledge  base,  sememtics  rules 
and  database  definitions,  amd  it  supports  both  analysis  of  text  and  limited  natural  lan¬ 
guage  queries.  It  differs  from  other  domain  software  chiefly  in  having  a  comparatively  rich 
knowledge  base. 
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2  Running  PUNDIT 

2.1  Core  Images  and  Domain  Images 

Before  you  can  use  PUNDIT,  the  software  must  be  installed  at  your  site  and  the  images 
built.  Appendix  A  contains  instructions  for  creating  a  pundit  core  image  and  a  muck 
domain  image. 

The  core  image  is  not  functional,  and  is  generally  used  only  to  build  the  domain  images.^ 
In  the  discussion  that  follows,  it  will  be  assumed  that  you  have  a  MUCK  domain  image 
available  to  you. 

2.2  The  MUCK  Domain 

The  MUCK  domain  has  been  designed  to  process  the  Remarks  field  of  Navy  tactical  mes¬ 
sages.  Since  the  formatted  fields  in  these  messages  contain  information  which  establishes 
the  initial  context  for  interpreting  the  text  (message  originator,  date/time,  etc.),  we  have 
developed  a  special  front-end  to  collect  this  information.  This  message  front-end  is  ac¬ 
cessed  by  issuing  the  command  pundit.  See  Section  4  for  more  information  about  this 
command. 

In  order  to  make  use  of  the  MUCK  domain  image  for  syntactic  and  semantic  analysis  of 
natural  language  input,  you  will  need  to  know  something  about  the  sublanguage  and  the 
knowledge  base  for  this  domain.  In  the  file  iauck_Borking.pl  you  will  find  a  subset  of 
the  messages  from  our  message  corpus  which  PUNDIT  is  currently  able  to  process.  By 
examining  other  domain-specific  files  such  as  the  lexicon,  the  knowledge  base,  and  the 
semantics  rules,  you  should  be  in  a  position  to  construct  your  own  input  (see  Appendix 
D  for  a  list  of  these  files). 


2.3  psirse  and  pundit. 

The  pundit  command  (discussed  above)  invokes  the  domain-specific  message  processing 
front-end  to  the  system,  which  collects  both  message  header  information  and  the  message 
body.  An  alternative,  domain-independent  method  of  accessing  the  system  is  provided  by 
parse,  which  prompts  only  for  the  text  to  be  processed.  Many  of  the  resesjchers  working 
on  PUNDIT  currently  interact  with  the  system  using  parse,  although  certain  higher-level 
processes — reference  resolution  in  particular — do  not  perform  as  well  as  they  otherwise 
could,  since  the  initial  discourse  context  is  empty.  The  parse  command,  however,  provides 
more  options  for  developers,  and  is  the  only  commsmd  to  use  when  no  semantic  processing 
is  desired  (the  front-end  invoked  by  pundit  assumes  that  a  complete  analysis  is  required). 
These  two  commands  are  discussed  in  more  detail  in  Section  4. 

*The  cote  image  contains  only  the  core  procednrea  of  PUNDIT,  including  the  core  leidcon  (see  Appendix 
D).  See  Appendix  B  for  details  on  how  to  create  a  functional  image  from  the  core  image. 
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2.4  Before  You  Begin 

Since  we  will  be  using  a  text  from  the  MUCK  domain  to  illustrate  pundit’s  operation,  at 
this  point  you  may  wish  to  load  the  MUCK  image.  Before  using  parse  or  pundit,  however, 
you  will  first  need  to  set  a  few  of  the  software  switches  which  enable  or  disable  various 
system  features.  Do  this  by  executing  the  svitchas  procedure  (described  in  more  detail 
in  Section  4).  The  seitchas  procedure  will  display  the  current  switch  settings  in  the 
image,  and  will  prompt  you  for  a  list  of  switches  to  be  changed.  Make  sure,  at  least  for 
now,  that  you  have  the  following  switches  turned  on,  and  that  all  the  others  are  turned 
off; 

1.  parsa.traa 

2.  conjunction 

3.  samantics 

4.  translatad.graiiDnar4>ra8ant 

5.  translatad^ranmar-in-usa 

6.  salaction 

At  this  stage  you  may  also  want  to  tell  the  Selection  module  not  to  query  you  about  new 
co-occurrence  patterns.  Call  the  procedure  ssuccaad  (see  Section  4  for  more  details). 

7  5  Processing  a  Sentence 

Having  brought  up  the  muck  domain  image  and  set  your  switches,  you  are  are  now  ready 
to  analyze  a  sentence.  Call  parsa,  and  you  should  see  the  prompt  “santanca:”.  Since 
the  following  section  describes  the  outout  generated  from  processing  the  sentence  visual 
sighting  of  periscope  followed  by  attack  with  asroc  and  torpedos.,  you  might  want  to  tj'pe  it 
in  now,  including  the  final  period.  After  typing  the  sentence  in,  you  will  need  to  signal  the 
end  of  input  by  entering  two  carriage-returns.  The  following  is  a  transcript  of  someone 
doing  what  you  have  just  been  asked  to  do  in  the  last  two  subsections^. 


^Note  that  if  you  latei  create  a  proXog.ini  file,  as  described  in  Appendix  C,  your  initial  switch  settings 
may  differ  from  those  shown  in  the  figure. 
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%/iilp/nlp /pundit /muck/Muck .  qimage 

Quintus  Prolog  Rsleass  2.2  (Sun-3,  Unix  3.2) 

Copyright  (C)  1987,  Quintus  Computer  Systems,  Inc.  All  rights  reserved. 
1310  Villa  Street,  Mo\intain  Vies,  Cadifomia  (415)  965-7700 

I  ?-  switches. 


1.  enter_new_word - 

— >  OFF 

2 •  ^  L  X  d  C  0 

Urr 

3«  s  A  ^  ^  X' A  A 

Uff 

COIlJ  IXUCC xou 

UN 

^  •  A  ^Qk&iX  C  X  ^  A 

>  Urr 

G  •  t«X  X & C 4^^^X  <UUfliAX 

“  — —  >  UN 

7  •  L  X  X A  C  4^^^X  ^UlUikAX^  ^  UA A 

^  urr 

— V  nw 

8.  gyiixddr**********'”**— 

10  •  d^COQipO A 

— ^  urr 

11  •  AlUDI&AXy 

urr 

12*  AliOU^XAX 

urr 

aaXaw^xoix 

•••>  UN 

14  •  X  ^  ^  .  AOO  A  A  8 

•••>  urr 

13  • 

urr 

1C«  aX X ^ X iU^ 

>  urr 

X  r  •  CXiIl^«  vX 

^  urr 

18.  window_diaplay”--------“~-““ 

urr 

Please  choose  a  list  of  switches,  or  type  "ok."  —  [3,5,7]. 

Changed  the  switch:  parse.tree - >  ON 

Changed  the  switch:  semantics - >  ON 

Changed  the  switch:  translated.grammar.in.use - >  ON 

yes 

I  ?-  ssucceed. 

Setting  selection  switch  xinknown.selection  to - >  succeed 

yes 

I  ?-  parse. 

sentence:  visual  sighting  of  periscope  followed  by  attack  with  asroc 
and  torpedos. 


Figure  1;  Running  PUNDIT 
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3  Interpreting  PUNDIT  Output 

Syntactic  processing  in  pundit  yields  two  syntactic  descriptions  of  a  sentence:  a  detailed 
surface  structure  parse  tree,  and  an  operator- argument  representation  called  the  laterme- 
diate  Syntactic  Representation,  or  isr.  The  ISR  regularizes  the  information  in  the  parse 
tree,  reducing  surface  structure  variants  to  a  single  canonical  form  and  eliminating  details 
not  required  for  semantic  analysis. 

pundit’s  semantic  and  pragmatic  components  take  the  isr  as  input  and  produce  a  final 
representation  of  the  information  conveyed  by  the  sentence  which  includes  a  decomposition 
of  verbs  into  a  structure  of  more  basic  predications,  resolution  of  anaphoric  references,  and 
an  analysis  of  temporal  relations.  The  resulting  data  structure  is  known  as  the  Integrated 
Discourse  Representation,  or  IDR. 

These  three  kinds  of  output  will  be  illustrated  for  the  following  sentence: 

Visual  sighting  of  periscope  followed  by  attack  with  asroc  and  torpedos. 

This  particular  sentence  is  characteristic  of  the  sort  of  input  pundit  has  been  designed  to 
handle.  Not»j  the  ellipsis  typical  of  message  sublanguages^. 

3.1  The  Parse  Tree 

The  syntactic  analyses  produced  by  PUNDIT  are  in  the  formalism  of  String  Grammar 
[Sager  81|.  A  brief  glossary  of  String  Grammar  terms  i.s  provided  below  in  figure  (2)  for 
help  in  understanding  the  parse  tree  in  figure  (3).  Parse  trees  cire  displayed  with  siblings 
indented  to  the  same  depth;  terminal  elements  (lexical  items)  are  preceded  by  »». 

3.2  The  ISR 

The  ISR  corresponding  to  the  parse  tree  in  figure  (3)  is  shown  in  figure  (4),  which  is 
taken  from  the  output  of  the  peurse  procedure.  Two  versions  of  the  iSR  are  given:  the 
first  is  essentially  the  data  structure  passed  to  semantic  analysis,  and  the  second  is  a 
pretty-printed  version. 

The  ISR  requires  little  knowledge  of  string  grammar  to  understand.  Each  clause  consists 
of  syntactic  operators  (OPS — generally  tense  and  aspect  markers  derived  from  the  verb 
morphology),  the  verb  or  predicate  (VERB),  and  its  arguments.  Conjunction  is  indicated 
by  the  insertion  of  the  conjunction,  followed  by  the  conjuncts  (set  off  by  parallel  lines). 
Note  that  each  noun  phrase  has  an  associated  referential  index;  in  this  example,  the  iSR 
has  been  printed  after  semantic  and  pragmatic  analysis,  and  the  indices  have  been  bound 
to  discourse  entities  ([sightl],  [periscopel] ,  etc.). 

^Translation:  The  visual  sighting  of  a  periscope  was  followed  by  an  attack  (on  the  submarine)  with 
anti-submarine  rockets  and  torpedos. 
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nstgo 

n«tg 

sa 

pn 

tpos 

qpos 

apos 

npos 

vanpass 

paasobj 

nullobj 

that* 

objba 

vingo 

conanaopt 

con  j  JBd 

spvord 

dstg 


<t=>  a  left-adjunct  -1-  r  +  right-adjunct  construction,  where  x  can  be 
n  O  a  common  noun 

a  O  an  adjective 

V  «  a  verb 
van  O  a  past  participle 
tv  o  a  tensed  verb 
ving  o  a  present  participle 
q  o  a  quantity  word 
pro  o  a  pronoun 
noun  string  object 
-«=>  noun  string 
<=>  sentence  adjunct 

<=>  preposition  +  noun  (prepositional  phrase) 

<=>  the/determiner  (prenominal)  position 
<=>  quantity  (prenominal)  position 
<=►  adjective  (prenominal)  position 
<=>  noun  (prenominal)  position 
<=>  past  participle  -f  passive 
<=>  passive  object 
< — >  null  object  (for  intransitive  verb) 

<=>  that  +  sentence  object 
<=>  object  of  be 
C=>  present  participle  -b  object 
comma  option 
<=>  conjunction  word 
<=>  special  (conjunction)  word 
<=>  adverb  string,  where  d  stands  for  adverb. 


Figure  2:  A  glossary  of  string-grammar  terms 
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(rafm«at 

aarocopala 

•abject 

ititg 

inr 

1b 

apoe 

a4jadj 

lari 

avar 

ad)  mm  iriaaal 

sear 

viag  mm  •ightiaf 


papa 

pa 

p  ot 

BStg 

lar 

la 

Bear 

a  SB  periecopa 
Wr 
eear 
aali'aax 
abject 
be'aax 
veapBM 
Uear 

vea  mm  followed 

•a 

P* 

P 

a«lg 

Inr 

1b 

near 

o  ww  attack 
ra 

papa 

pa 

p  mm  with 
o«tg 
lor 
la 

oear 

n  ws  a«ro< 
coaj'wd 
•pword  mm  aad 
lor 
In 

tpo*  mm  tagged  local 
qpo*  wa  tagged  local 
apos  ww  lagged  local 
Dpoa  aw  tagged  local 
Bvar 


a  aw  torpedo* 

pa««obj 

oallobj 


Figure  3:  Parse  tree  for  Visual  sighting  of  periscope  followed  by  attack  with  asroc  and 
torpedos. 
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IITERMEOIATE  STITACTIC  REPRESEITATIOI  (ISR) : 


[untansad, follow,  sab  j(paaaiv«) ,obj(Ctpos(0) , Cgarond.nvarCCsight .siagnlar, 
Csightl]] ) ,pp( [of , CtpoaC  □  ) , CnwarC [pariacopa , aingalar , [pariacopal]] )]]])], 
adj ( [visual] ]] ) ,pp( [by, Ctpoa(  □  ) . CnTar( [attack, a iagnlar, [attackl]]) ,pp( [witb, 
[and, [tpoaC  □ ) , [nvarC  [anti'aubaarina'rockat. aingalar, [rockatl]])]] , [tpos(O) , 
[nvarC  [torpado, plural,  [torpadoal]]  }]]]])]]])] 


OPS: 

VERB: 

SUBJ: 

OBJ: 


PP: 


untans ad 

follow 

passiva 

garund:  sight  (sing)  :  [sight!] 

L.MQD:  acij:  visual 
R_MOD :  PP :  of 

pariacopa  (sing)  :  [pariacopal] 
by 

attack  (sing)  :  [attackl] 

R.HOO:  pp:  with 

and 


anti'subaarina'rockat  (sing)  ;  [rockatl] 


torpado  (pi)  :  [torpadoal] 


Figure  4:  ISR  for  Visual  sighting  of  periscope  followed  by  attack  with  asroc  and  torpedos. 
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3.3  The  roR 

The  IDR  for  the  example  sentence  is  shown  in  figure  (5);  its  major  segments  are  labelled 
Ids,  Properties,  Events  and  Processes,  States,  and  Important  Time  Relations. 

The  Ids  segment  lists  all  the  id,  is^oup,  and  generic  predications  derived  during  the 
analysis  of  the  example  sentence.  Generic  relations  are  established  primarily  to  support 
subsequent  reference  through  generic  they  or  one- anaphora^.  Id  relations  indicate  the 
semamtic  type  of  each  non-group  discourse  entity,  while  the  is..group  relations  specify  the 
semantic  type,  members,  and  cardinality  of  each  group-level  discourse  entity.  Thus  for 
example  the  id  relation  for  the  entity  [sight  1]^,  derived  from  the  nominalization  visual 
sighting  of  periscope,  indicates  that  the  entity  is  an  event,  while  the  i8..group  relation  for 
the  entity  [project ilesl]  indicates  that  the  entity  is  a  group  of  projectiles,  consisting 
of  an  unknown  number  of  rockets  and  torpedos. 

Relations  in  the  Properties  segment  of  the  idr  are  heterogeneous:  these  aae  miscella¬ 
neous  relations  derived  in  the  course  of  processing  noun  phrases.  Prenominal  adjectives 
typically  give  rise  to  such  relations;  processing  of  noun-noun  compounds  may  generate 
unapecif  ied_relationship  predications  if  no  relationship  between  the  nouns  can  be  de¬ 
rived  from  domain  knowledge.  In  the  current  example,  the  reportingPlatform  relations 
axe  generated  by  a  procedure  which  creates  a  default  entity  if  the  identity  of  the  mes¬ 
sage  originator  is  not  known — if  we  had  used  the  pundit  procedure  instead  of  parse,  this 
information  would  have  been  supplied  by  the  message  header. 

The  Events  and  Processes  and  States  segments  of  the  IDR  contain  predications  over 
discourse  entities  which  denote  situations^.  Typically  it  is  the  processing  of  a  clause  or  a 
nominalization  which  gives  rise  to  a  situation  entity,  and  if  the  situation  is  an  event,  then 
an  entity  will  be  generated  for  the  resulting  state  as  well.  The  main  predicate  is  the  type 
of  situation  (event,  state,  or  process),  and  each  predication  has  three  arguments: 

1.  The  discourse  entity 

2.  The  associated  semantic  representation 

3.  A  moment  or  period  of  time  for  which  the  situation  holds 

For  example,  the  first  predication  in  the  Events  and  Processes  segment  in  figure  (5) 
was  derived  from  processing  the  ISR  for  the  nominalization  visual  sighting  of  periscope. 
This  particular  predication  asserts  that  the  referent  introduced  by  the  gerund  sighting 
denotes  an  event;  the  semantic  representation  was  constructed  based  on  the  semantics 
rules  for  the  verb  sight.  All  situations  that  are  labelled  events  in  pundit  can  be  more 

[Dahl  84]  for  a  description  of  the  relationship  between  generics  and  one-anaphora. 

^Labels  for  discourse  entities  are  derived  from  the  lexical  head  of  the  expression  and  are  typically 
enclosed  in  brackets.  These  labels  are  arbitrary;  [«ntity2]  would  do  equally  well. 

*See  [Passonneau  87]  for  a  more  detailed  discussion  of  the  semantics  of  situations. 
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accurately  described  as  transitions  from  one  state  into  another,  where  the  full  temporal 
structure  of  the  event  consists  of  an  initial  process  interval,  the  moment  of  transition, 
and  the  new  situation  that  is  entered  into^.  In  the  second  argument  of  the  predication, 
the  become?  operator  takes  as  its  argument  the  semantic  representation  that  gives  rise  to 
the  new  situation  that  is  entered  into,  C8ight2].  The  third  argument  of  the  predication, 
momentCCsightl]),  should  be  interpreted  functionally  as  returning  the  moment  at  which 
the  transition  into  the  state  in  question  occurred.  Information  about  this  new  state, 
[sight2],  is  provided  by  a  predication  in  the  States  held. 

The  final  segment  of  the  IDR  lists  the  temporal  relations  which  were  analyzed  as  holding 
amongst  the  situations.  Note  in  particular  that  since  the  verb  follow  is  defined  as  a 
temporal  operator,  pundit  has  correctly  established  the  temporal  relationship  between 
the  sighting  and  the  attack. 


^There  is  no  referent  introduced  for  the  initial  process  interval  of  transition  events. 
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Ids: 

generic (torpedo) 

iS-gronp( [torpedoel] .meBbereCtorpedo, [torpedosl] ) .nuBb(_21227}) 
generic (anti'snbBarine'rocket) 
idCanti'enbaarinn'rocket , [rocketl] ) 

i8_gronp(  [projectiles l]  .BeabereCprojectile. [[rocketl] , [toxpedosl]] ) ,niiab(_21279)) 

id(ae_platfonB, [ns.platlorml]) 

id(process , [attackl] ) 

generic(periecope) 

id(peri8Cope, [periecopet] ) 

idCue.platf  oxB,  [ne.plattomS]  } 

idCetate, [eight2]} 

idCevent, [sigbtl]} 

Properties : 

report ingPlatfoniC [us.platf oral] ) 
reportingPlatlornC [as.platiorsS] ) 

Events  and  Processes: 
event ( 

[sigbtl] 

becoBeP(sightP(ezperiencer( [ns.platloxBS]) ,tbeBe( [periscopel] ) , instroaent (visual) ) ) 
sighted.atP(tbeae( [periscopel] ) .location (.28507) ) 

Boaent ( [sight 1] ) ) 

process( 

[attackl] 

doP(attackP( actor ( [us.platf oral] ) .theBe(.19607) , instroaent ( [pro jectilesl] ) ) ) 
period( [attackl] ) ) 

States : 
state( 

[sight2] 

sightP(ezperiencer(  [us.platforaS] ) ,theBe( [periscopel] ) .instruaent (visual)) 
sighted_atP(tbeae( [periscopel] ) ,locatlon(_28507)) 
period ( [8ight2] ) ) 

laportant  Tiae  Relations: 

the  sight  state  ([8ight2])  started  with  the  sight  event  ([sightl]) 

the  sight  event  ([sightl])  preceded  the  arbitrary  event  time  (Boaent( [attackl] ) ) 

oi  the  attack  process  ([attackl]) 


Figure  5:  IDR  for  Visual  sighting  of  periscope  followed  by  attack  with  asroc  and  torpedos. 
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4  Commonly  Used  Procedures 

4.1  eclit_rule 

The  procedure  edit-rule/ 1  allows  you  to  edit  a  set  of  grammar  rules  for  a  specified  non¬ 
terminal,  using  the  Prolog  Structure  Editor.  For  more  details,  please  consult  [Riley  86]. 

4.2  edit-word 

The  procedure  edit-«ord/l  allows  you  to  edit  the  lexical  entry  for  a  specified  word,  using 
the  Prolog  Structure  Editor.  For  more  details,  please  consult  [Riley  86], 


4.3  parse 

The  procedures  parse  and  pundit  (see  below)  provide  two  slightly  different  front-ends  to 
the  PUNDIT  system,  parse  is  the  access  method  of  preference  for  those  whose  primary 
interest  is  parsing  or  minimizing  keystrokes  (no  prompts  are  issued  to  collect  message 
header  information).  The  parse  procedure  is  a  core  component  of  PUNDIT,  and  is  domain- 
independent. 

The  behavior  and  output  of  parse  are  largely  controlled  by  switch  settings  (see  Section 
4).  Briefly,  the  parse  procedure  collects  the  input  to  be  analyzed  by  pundit,  and  then 
calls  syntactic  analysis.  Depending  on  your  switch  settings,  it  may  then  call  semantic 
analysis,  the  database  extractor,  amd  the  summary  module  (if  defined  for  the  current 
domain).  Depending  again  on  switch  settings,  you  may  be  shown  both  intermediate 
and  final  results:  trace  messages,  the  parse  trees,  the  ISRS,  the  IDR,  database  relations 
extracted,  and  a  summarization  of  the  input  text^.  In  the  course  of  processing  your  input, 
PUNDIT  may  engage  you  in  dialogue  if  certain  switches  are  turned  on:  for  example,  the 
Selection  module  may  ask  you  about  co-occurrence  patterns;  if  the  switch  antar-ne«-«ord 
is  on,  you  will  be  prompted  to  enter  lexical  information  for  new  words. 

The  initial  prompt  to  collect  the  input  depends  on  switch  settings  as  well.  If  the  switch 
text  jnoda  is  on,  you  will  be  prompted  to  enter  a  paragraph  of  text:  that  is,  one  or  more 
sentences  followed  by  two  carriage  returns^.  In  this  case,  the  input  will  be  processed  one 
sentence  at  a  time,  and  the  first  parse  for  each  sentence  will  be  processed. 

If  the  switch  tazt-noda  is  off,  you  will  be  prompted  to  enter  a  single  sentence;  after 
processing  the  first  parse,  you  will  be  invited  to  continue  with  the  next  parse,  until  you 
wish  to  stop  or  all  parses  have  been  exhausted. 

*The  lammaiy  application  is  not  implemented  in  the  MUCK  domain. 

’Since  each  sentence  may  optionally  be  followed  by  one  carriage  return,  the  extra  carriage  return  at 
the  end  is  needed  to  signal  the  end  of  input.  Moreover,  although  PUNDIT  will  process  run-on  sentences 
(without  punctuation),  the  final  sentence  must  have  a  terminator:  a  period,  exclamation  point,  or  question 
mark. 
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In  addition  to  tLcSe  capabilities,  designed  for  the  processing  of  sentences,  you  may  also  an¬ 
alyze  lower-level  constituents.  To  process  an  isolated  noun  phrase,  call  pars«mp/0  (this 
procedure  supports  both  syntactic  and  semantic  analysis).  NPs  and  other  constituents 
may  also  be  parsed  by  invoking  parse/ 1,  ^ving  as  argument  the  gr2mimatical  category 
(this  will  require  a  knowledge  of  pundit’s  grammatical  categories).  As  a  simple  illustra¬ 
tion,  you  may  parse  the  noun  phrase  visual  sighting  of  periscope  by  calling  parse  (Inr). 
Note,  however,  that  parse  (Inr)  does  not  support  semantic  amalysis. 

4.4  pundit 

The  pundit  procedure  provides  a  domain-specific  front-end  to  the  pundit  system,  one 
geared  specifically  towards  full  message  processing.  Since  pundit  is  similar  in  many  re¬ 
spects  to  parse  (see  above),  only  differences  will  be  described  here. 

First,  pundit  is  not  sensitive  to  the  semantics  amd  text  jsode  switches:  it  is  assumed  that 
all  messages  require  semantic  analysis,  and  that  all  input  will  be  one  or  more  sentences 
of  text.  As  a  result,  it  is  not  possible  to  request  multiple  parses  of  the  input.  However, 
if  a  sentence  fails  semantic  analysis,  ptmdit  will  backtrack  for  the  next  parse,  and  this 
process  will  continue  until  a  semantically  acceptable  parse  is  found. 

Secondly,  pundit  provides  a  domain-specific  message  entry  screen  which  collects  the  mes¬ 
sage  header  and  the  message  body.  The  screen  for  the  MUCK  domain  is  shown  in  Figure 
(6)  below  (you  may  enter  a  question  mark  at  any  prompt  to  receive  a  description  of  valid 
responses).  The  responses  to  the  first  four  prompts  are  used  to  establish  the  discourse 
context  for  the  interpretation  of  the  message  body. 

The  pundit  procedure  also  provides  capabilities  for  processing  one  or  more  existing  mes¬ 
sages  from  the  message  corpus  (stored  in  <domain> .working. pi).  When  you  first  invoke 
pxmdit,  the  message  corpus  is  compiled  into  your  image,  creating  entries  in  the  recorded 
database^*’.  At  the  prompt  for  Message  number,  you  may  enter  the  number  of  an  existing 
message,  and  pundit  will  fetch  the  message  from  the  recorded  database  and  process  it. 
If  you  wish  to  process  a  list  of  existing  messages,  call  pundit  (batch,  YourList),  where 
YourList  is  a  Prolog  list  of  message  numbers.  You  may  also  process  the  entire  message 
corpus  by  calling  punditfbatch.teat^aundit)". 


there  ia  a  version  of  the  message  corpus  in  your  directory,  pundit  will  load  that;  otherwise,  it  will 
load  the  file  from  the  main  domain  directory.  This  feature  allows  yon  to  maintain  a  personal  corpus  of 
texts. 

‘‘This  is  the  method  which  we  use  to  test  software  changes:  the  output  can  be  saved  in  a  hie  and 
compared  against  the  resulU  of  testing  a  previous  image. 
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X’nlp/pundit/muck/Muck.qinage  * 

Loading  /usr/locaJ./bln/eiii215  vith  /mn2/q2.2/ml. . . 

Unix  Prolog-t'Emacs  V2.15  (Ol-Jan-88) 

Cop3nright(c}  1986,  1987  Uniprass  Softaara,  Inc. 

Quintus  Prolog  Ralaasa  2.2  (Sun-3,  Unix  3.2) 

Copyright  (C)  1987,  Quintus  Computar  Systaos,  Inc.  All  rights  rasarrad. 
1310  Villa  Straat,  Motintain  Viaa,  California  (415)  965-7700 

[consulting  /mn2/cball/prolog.ini. . .] 

Satting  selection  saitch  uhknoan.selection  to  - >  succeed 

[prolog.ini  consulted  0.133  sac  720  bytes] 

I  ?-  pundit . 

[compiling  /nlp/nlp/pundit/muck/muck.aorking.pl. . .] 

[fflUck_working.pl  compiled  2.700  sac  12,612  bytes] 

4I************************  RAINFORM  HESSAGE  ENTRY  ****0m************«** 

Message  number  [1]  :11 

Enemy  platform  [barsuk]  : submarine 

Reporting  platform  [Virginia]  :taxas 

Report  time  [0800t]  :0800t 

Sighting  message:  sighted  periscope  an  asroc  was  fired  proceeded  to 
station  visual  contact  lost,  constellation  helo  hovering  in  vicinity, 
sub  appeared  to  be  ooa. 

Processing  discourse  segment _ 


Segment  processing  Time:  39.967  sec. 


*****♦**♦♦♦*♦••**************♦  Complete  IDR  ***v************************ 

(etc.) 


Figure  6;  Using  the  pundit  procedure 
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4.5  punt 

This  procedure  provides  on-line  documentation  for  several  pundit  utilities:  the  Prolog 
Structure  Editor,  the  Lexical  Entry  Procedure,  tools  for  creating  a  concordance,  and  the 
Dictionary  Merge  utility.  To  invoke  the  pimt  utility,  type  pimt  at  the  Prolog  prompt. 

4.6  rdbjpemove 

This  development  utility  removes  entries  of  specified  type(s)  from  the  Prolog  recorded 
database.  It  is  useful  when  testing  changes  to  one  of  the  files  whose  compilation  creates 
such  entries.  For  example,  the  ptmdit  procedure,  as  one  of  its  steps,  compiles  the  message 
corpus  into  your  '■nrrent  image.  If  you  should  wish  to  edit  and  reload  the  message  file 
(<dofflain> -varking.pl),  you  must  first  remove  the  old  messages:  rdb.xafflova  facilitates 
this  task.  A  sample  session  is  given  below. 

I  ?-  rdb.ranove. 

Recorded  Database  Rules: 

1.  The  Lexicon  (diet) 

2.  The  Bnf  (bnf) 

3.  Define  and  Simplification  Rules  (define) 

4.  Semantic  Selection  Rules  (semantics) 

5.  Clause  Mapping  Rules  (mapping) 

6.  No\m  Phrase  Mapping  Rules  (mapping.np) 

7.  All  Semantics  Rules  (all.semantics) 

8.  The  Selectional  Patterns  (selection) 

9.  The  Stable  Messages  (messages) 

10.  quit 

Please  choose  a  list  of  items  —  [9] . 

Erasing  corpus  muck... 

Time  to  erase  the  testing  messages:  0.15  sec. 

Figure  7:  Using  the  rdb -remove  utility 

Note  that  options  3-7  are  obsolete  (semantics  rules  are  not  stored  in  the  recorded  database). 


[obsolete] 

[obsolete] 

[obsolete] 

[obsolete] 

[obsolete] 
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4.7  readln 

The  procedure  readln/ 1  loads  a  PUNDIT  lexicon  into  the  current  image.  Its  argument 
is  the  name  of  a  lexicon  file.  For  example,  to  load  the  lexicon  hie  mj.lez.pl  from  the 
current  working  directory,  execute  the  goal  readlnCmy.laz)  .  Lexical  entries  are  stored  in 
the  recorded  database;  to  avoid  duplicate  entries,  it  may  be  necessary  to  run  rdb_refflov« 
to  remove  previous  entries  before  using  readln  to  load  a  new  lexicon. 

4.8  squery 

The  predicate  aquery/O  is  used  to  control  the  behavior  of  the  Selection  component  when 
it  encounters  am  unknown  selectional  pattern.  Execute  the  goal  squery  to  be  queried 
when  an  unknown  pattern  is  encountered.  For  more  details,  see  Section  12  of  [Lang  87]. 


4.9  ssucceed 

The  predicate  saucceed/O  is  analogous  to  squery/0,  except  that  it  is  used  to  allow  un¬ 
known  selectional  patterns  to  succeed.  There  is  also  a  predicate  afail/O  which  can  be 
used  to  force  unknown  selectional  patterns  to  fail.  For  more  details,  see  [Lang  87]. 
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4.10  switches 

The  svitchas  utility  allows  you  to  control  the  operation  of  pundit.  Each  switch  and  its 
dependencies  are  described  in  more  detail  below. 


I  ?-  saitches. 


1.  entar.new.aord - >  OFF 

2.  np.traca - >  OFF 

3.  parsa.trea - >  OFF 

4.  conjunction - >  ON 

5.  aamantics - >  OFF 

6.  translatad.granmar .present - >  OM 

7.  translated.grammar.in.use - >  OFF 

8.  grinder - >  OFF 

9.  text  .mode - >  OFF 

10.  decompositlon.traca - >  OFF 

11.  summary'* - >  OFF 

12.  shov.isr — - - >  OFF 

13.  selection — - - >  ON 

14.  enable.db. access - >  OFF 

15.  count—— - - - >  OFF 

16.  all.time - >  OFF 

17.  time.trace - >  OFF 

18.  aindow.display— - ~— >  OFF 


Please  choose  a  list  of  switches,  or  type  "ok."  —  [5,7,9]. 


Changed  the  switch:  semantics - >  ON 

Changed  the  switch:  translated.grammar.in.use - >  ON 

Changed  the  switch:  text  .mode - >  ON 


Figure  8:  Using  the  switches  utility 


Several  related  procedures  are  useful  in  this  connection.  The  procedure  status  dis¬ 
plays  current  switch  settings;  flip/1  reverses  the  setting  of  one  switch  (for  example, 
flipCsemantics));  tum_on/l  and  tiim_off/l  turn  a  specified  switch  on  and  off. 
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4.10.1  enter_new_word 

This  switch  controls  the  behavior  of  pundit  when  lexical  lookup  encounters  a  word  which 
is  not  in  the  lexicon  and  which  cannot  be  analyzed  by  the  Shapes  module.  If  the  input  to 
PUNDIT  contains  an  unrecognizable  word  and  this  switch  is  off,  lexical  lookup  will  issue 
the  following  error  message: 

No  definition  foimd  for  --  <UNKNOWN-WORD> 
sentence  failed  . . . 

If  the  switch  is  on, you  will  be  given  the  following  options: 

1.  Respell  word 

2.  Add  dictionary  entry 

3.  Word  is  a  proper  noun 

4.  Quit 

Choose  the  first  option  if  you  have  simply  misspelled  the  word.  If  the  word  is  a  proper 
name,  you  may  choose  the  third  option  (but  do  dictionary  entry  will  be  created).  If 
you  choose  to  add  a  new  dictionary  entry,  the  Lexical  Entry  Procedure  is  invoked,  and 
you  will  be  prompted  to  enter  morphological  and  grammatical  information,  which  may 
be  optionally  saved  in  a  file  in  your  directory  (consult  [Riley  88]  and  [Linebarger  88]  for 
more  detail).  Note  that  the  information  collected  will  allow  PUNDIT  to  proceed  with  the 
syntactic  analysis  of  the  input,  but  majy  not  be  sufficient  to  enable  semantic  analysis:  for 
this,  it  may  be  necessary  to  add  new  semantics  rules  and/or  update  the  knowledge  base. 


4.10.2  np-trace 

This  switch  controls  the  display  of  Reference  Resolution  trace  messages  concerning  the 
creation  of  discourse  entities.  Turning  this  switch  on  will  only  have  an  observable  effect  if 
the  semantics  switch  is  turned  on  as  well. 


4.10.3  parse_tree 

This  switch  controls  printing  of  the  parse  tree  and  the  ISR.  The  parse  tree  and  ISR  are 
always  computed  whether  this  switch  is  on  or  not. 
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4.10.4  conjunction 

This  switch  is  one  of  several  switches  that  cannot  be  switched.  The  switch  will  be  on  if 
the  conjunction  meta-rule  has  been  applied  to  the  grammar,  and  will  be  off  otherwise.  If 
this  switch  is  off,  and  you  want  the  grammar  to  include  conjunction,  run  the  procedure 
gen-conj  /O.  After  the  meta-rule  has  been  applied,  the  switch  wiU  automatically  be  turned 
on.  Since  the  meta-rule  cannot  be  undone,  the  switch  cannot  subsequently  be  turned  off. 


4.10.5  semantics 

Turn  this  switch  on  to  enable  semantic  and  pragmatic  analysis  of  input;  turn  it  off  if 
you  wish  only  to  parse.  Only  the  parse  procedure  is  sensitive  to  this  switch:  the  pundit 
procedure  assumes  that  you  want  a  full  analysis  of  the  input. 

4.10.6  translated-grammar -present 

The  switch  indicates  whether  or  not  the  grammar  has  been  translated  into  Prolog.  The 
switch  is  on  in  the  software  which  accompanies  this  document,  and  cannot  be  turned  off. 

If  at  your  site  an  image  has  been  developed  in  which  this  switch  is  off,  then  the  grammar 
must  be  run  interpreted.  Running  interpreted  is  slow,  but  it  facilitates  debugging  and 
rapid  grammar  changes.  Turning  the  switch  on  will  translate  the  grammar,  which  may 
take  a  few  minutes;  after  translation,  you  will  be  given  the  option  to  compile  the  resulting 
Prolog  code.  You  will  normally  want  to  do  this,  because  the  compiled  translated  grammar 
provides  the  fastest  parsing.  The  only  reason  not  to  do  this  is  if  you  want  to  use  the  Prolog 
debugger  on  the  translated  code,  which  is  not  advised.  If  at  any  time  you  want  to  compile 
the  translated  grammar,  compile  the  file  translated-grammar. pi. 


4.10.7  translated-grammar  Jn_use 

This  switch  allows  you  to  parse  with  the  grammar  translated  (on)  or  interpreted  (off). 
Although  the  switch  is  off  in  the  software  which  accompanies  this  document,  you  will 
normally  want  it  to  be  on  (for  the  fastest  parsing).  The  only  reason  to  turn  this  switch  off 
is  to  make  use  of  certain  grammar  debugging  tools  that  are  only  available  when  interpreting 
the  grammar,  such  as  grinding  and  counting. 


4.10.8  grinder 

This  switch  allows  you  to  trace  the  application  of  grammar  rules  and  restrictions,  a  de¬ 
velopment  feature  which  is  only  available  when  parsing  with  the  grammar  interpreted  (if 
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you  turn  this  switch  on,  the  translated-grammar  Jn.use  switch  will  automatically  be 
turned  off). 

The  facility  is  called  grinder  because  it  typically  produces  considerable  output.  To  reduce 
the  amount  of  output,  you  may  choose  to  trace  only  the  application  of  specific  grammar 
rules  or  restrictions. 

I  ?-  turn.onCgrinder) . 

Enter  one  of:  [<shat  you  eant  to  grind  on>] , 
off ,  or 
all 

**  WARNING  **  If  you  grind  at  all.  you  vill  automatically  run  interpreted. 
Enter  choice: 


Figure  9:  Setting  the  grinder  switch 


4.10.9  text-mode 

This  switch  is  used  by  the  procedure  parse.  If  it  is  on,  you  will  be  prompted  to  enter 
a  paragraph  of  text  (one  or  more  sentences  followed  by  two  carriage  returns).  Only  the 
first  parse  for  each  sentence  in  the  paragraph  will  be  processed.  J  the  switch  is  off,  you 
will  be  prompted  to  enter  a  single  sentence,  and  you  may  step  through  all  parses  for  that 
sentence. 

4.10.10  decomposition-trace 

This  switch  allows  you  to  monitor  the  course  of  semantic  analysis:  if  it  is  on,  a  variety 
of  trace  messages  will  be  displayed,  including  the  ISR  for  each  clause  about  to  be  pro¬ 
cessed  and  the  semamtic  representation  of  the  input  as  it  is  built  up.  While  the  switch 
was  designed  to  facilitate  development  of  semantics  rules  and  the  knowledge  base,  the 
trace  messages  are  also  useful  when  diagnosing  the  source  of  an  incorrect  or  unsuccessful 
semantic  analysis.  Note  that  decomposition-traca  has  no  effect  unless  the  semantics 
switch  is  also  on. 

4.10.11  summary 

This  switch  controls  whether  or  not  a  domain-specific  module  is  called  to  create  a  sum¬ 
mary  of  the  input  text.  Since  summaries  depend  on  the  output  of  semantic  analysis, 
the  semantics  switch  must  be  turned  on.  Note:  the  summary  application  has  not  been 
implemented  in  the  muck  domain. 
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4.10.12  showJsr 

This  switch  controls  the  display  of  the  ISR;  its  effect  depends  on  whether  you  are  using 
parse  or  pundit.  If  the  switch  is  on  and  you  are  using  the  parse  procedure,  the  incre¬ 
mental  ISR  will  be  displayed  for  each  node  in  the  parse  tree.  This  is  useful  for  debugging 
changes  to  the  ISR,  but  not  recommended  otherwise.  Note  that  the  parse.tree  switch 
must  also  be  on  in  this  case  (when  using  parse,  you  cannot  see  the  iSR  without  also 
displaying  the  parse  tree). 

If  you  are  using  the  pundit  procedure  and  this  switch  is  on,  the  iSR  for  each  sentence 
will  be  displayed  after  syntactic  analysis  and  before  semantic  analysis.  In  this  case,  the 
pairsa.tree  switch  need  not  be  on. 


4.10.13  selection 

This  switch  controls  whether  or  not  the  Selection  module  is  invoked  in  the  course  of 
parsing.  If  it  is  on.  Selection  will  be  called;  if  it  is  off.  Selection  will  not  be  called.  For 
more  details,  see  [Lang  87]. 

4.10.14  enable.db_access 

This  switch  controls  whether  or  not  queries  and  assertions  access  the  database  defined  for 
the  current  domain.  It  is  used  by  the  procedures  parse  and  pundit.  If  the  switch  is  on, 
dommn-specific  database  definitions  will  be  used  to  extract  database  relations  from  the 
results  of  semantic  analysis,  and  these  relations  will  be  displayed  on  your  screen. 

Dependencies:  semantics  must  be  turned  on,  and  database  relations  must  be  defined  for 
the  current  domain  (<domain> _db_structure.pl  and  <domain> _dbjnapping.pl). 


4.10.15  count 

This  switch  should  be  left  off. 


4.10.16  all.time 

This  switch  controls  the  display  of  the  time  relations  segment  of  the  IDR.  K  it  is  off,  the 
segment  is  labelled  Important  Tima  Relations  and  contains  what  are  judged  to  be  the 
most  prominent  temporal  relations  discovered  during  temporal  analysis  of  the  input.  If 
it  is  turned  on,  the  segment  is  labelled  Complete  Time  Relations,  and  all  the  relations 
that  could  be  discovered  are  displayed.  Turning  this  switch  on  will  only  have  an  observable 
effect  if  the  semantics  switch  is  turned  on  as  well. 


4  COMMONLY  USED  PROCEDURES 


22 


4.10.17  time.trace 

This  switch  allows  you  to  monitor  the  course  of  temporal  analysis.  If  it  is  on,  informative 
trace  messages  will  be  displayed  about  situation  representations  as  they  are  constructed 
by  the  Time  component.  Turning  this  switch  on  will  only  have  an  observable  effect  if  the 
semantics  switch  is  turned  on  as  well. 


4.10.18  window  .display 
This  switch  should  be  left  off. 
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A  Installing  the  System 

The  PUNDIT  system  runs  under  release  4.3  of  Berkeley  UNIX  and  Quintus  Prolog  (cur¬ 
rently  release  2.2).  Before  installing  PUNDIT,  a  /nip  partition  should  first  be  created; 
this  partition  should  contain  the  directory  /nlp/nlp/pundit,  where  the  core  pundit 
components  will  be  installed.  Software  for  the  MUCK  domain  will  be  installed  in  the 
/nlp/nlp/pundit/muck  subdirectory. 

If  these  partitions  and  directories  cannot  be  created,  several  absolute  path  names  in  PUN¬ 
DIT  code  will  require  modification:  the  files  and  lines  of  code  are  listed  below.  Note  that  if 
it  is  necessary  to  create  alternative  directories  to  those  recommended,  please  ensure  that 
core  PUNDIT  files  and  domain-specific  files  are  stored  in  separate  directories. 

FILENAME  code 

punt.pl  a88arta(hom«_dir("/nlp/nlp/pundit/") ) . 

qprologl5.pl  timeCom  unlz(8hell(*/Bin2/AI/nlp/bin/timaCoD’))  . 
semjedit.pl  compllaC'nlp/pundlt/samad/corractFozma.pl’) . 

switches.pl  complla  ( *  'nlp/p\mdit/coutit_on .  pi  * ) . 

switches.pl  compila(*'’nlp/pimdit/count.off  .pi*) . 

compilePundit  pundit  .directory  ( ’  /nlp/nlp/pundit  * ) . 

compileMuck  muck.diractoryC  ’  /nlp/nlp/pundit/muck* ) . 

We  strongly  recommend  that  the  files  in  the  pundit  home  directory  (and  its  subdirecto¬ 
ries)  be  owned  by  a  special  user,  and  that  the  file  protections  be  set  in  such  a  way  that 
only  this  special  user  can  alter  these  files. 
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B  Building  PUNDIT  Images 

B.l  Building  a  Core  PUNDIT  Image 

To  create  a  core  puhoit  image,  execute  the  following  sequence  of  steps: 

1.  go  to  a  directory  to  which  you  have  write  permission 

2.  type  to  the  UNIX  prompt  the  command 

qprolog2.2  <  /nlp/nlp/pundit/makePtmdit^^ 

Executing  these  steps  will  deposit  in  the  current  working  directory  a  Prolog  saved  state 
called  Pundit. tastiaaga,  which  is  the  core  PUNDIT  image. 


B.2  Creating  a  Functional  Core  PUNDIT  Image 

The  core  PUNDIT  image  itself  is  not  functional  (i.e.,  it  cannot  be  used  to  parse  sentences), 
and  is  only  used  to  build  the  domain-specific  images.  If,  however,  a  user  wishes  to  make 
a  functional  image  from  a  core  pundit  image,  the  following  steps  should  be  executed: 

•  Create  a  file  containing  the  following  Prolog  code: 


7,  Turn  on  conjunction  and.  translate  the  gramoar 
: -  gen.conj . 

: -  translate.granaarC '/nlp/nlp/pundit/translated_grafflmar .pi ' ) . 
:-  compileC’/nlp/nlp/pimdit/translated.gramsar.pl’) . 

:-  compileC’/nlp/nlp/pundit/muck/compute.types.pl') . 

7,  These  declarations  are  required  for  the  Selection  module 
pundit.domain(core) . 

isaCnothing, nothing) . 

semantic.typeCnothing, nothing) . 


‘^Instead  of  qprolog2.2,  yoa  should  us«  whatever  command  is  necessary  at  your  site  to  start  up  the 
current  version  of  Quintns  Prolog. 
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•  Start  up  the  core  pundit  image  and  compile  the  hie  containing  the  code  above. 

•  Save  the  resulting  image  (e.g.,  by  executing  the  goal  save.programC  ’  Pimdit .  nasimage  * ) . 

Note  that  this  image  can  be  used  only  for  parsing,  since  most  of  the  procedures  required 
for  semantic  analysis  (e.g.  the  knowledge  base  and  semantics  rtdes)  are  domain-specific. 

B.3  Creating  a  Complete  Domain-Specific  Image 

To  create  a  complete  domain-specific  image  (in  this  case,  an  image  for  the  muck  domain), 
follow  these  steps: 

•  again,  go  to  a  directory  to  which  you  have  write  permission 

•  type  to  the  UNIX  prompt  the  command 

/nlp/nlp/ptmdit/Pundit.tastimaga  <  /nlp/nlp/pundit/nnick/ntakaMuck.^^ 

Executing  these  steps  will  deposit  in  the  current  working  directory  a  Prolog  saved  state 
called  Muck.taatimaga,  which  is  the  complete  domain  image.  Once  the  above  procedure 
has  been  completed,  either  of  these  two  Prolog  saved  states  can  be  started  up  simply 
by  typing  Pimdit. tastimaga  or  Muck.taatimaga  to  the  UNIX  prompt  (or  by  typing  the 
absolute  filename,  if  the  user  is  not  in  the  directory  in  which  these  files  are  found).  The 
images  can,  of  course,  be  renamed  if  desired. 


‘^Thia  aasames  that  Pundit. taatiaag* is  currently  in  the  directory  /nlp/nlp/pundit. 
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C  Customizing  Your  PUNDIT  User  Environment 

Because  pundit  is  written  in  Quintus  Prolog,  we  can  use  one  of  its  features  to  make 
it  easy  to  customize  PUNDIT  for  individual  use.  Wken  Prolog  first  starts  up,  it  checks 
in  the  user’s  home  directory  for  a  file  nsuned  prolog.ini.  If  such  a  file  exists,  Prolog 
win  compile  it  into  its  current  image.  Using  this  feature,  we  can  instruct  Prolog  to 
automatically  set  PUNDIT  switches  to  those  settings  that  we  find  most  convenient.  In 
Figure  10  is  an  example  of  one  such  prolog.ini.  The  example  code  first  checks  to  see  if 
Prolog  is  running  a  pundit  image;  if  it  is,  switches  are  set  to  the  desired  settings  (in  this 
case,  to  those  most  convenient  for  grammar  development).  Observe  in  particular  that  the 
switch  translated-grammar-injuae  is  turned  on  only  if  tran8lated.gra]nmar -present  is 
already  on.  At  the  end,  a  procedure  is  called  which  displays  the  current  switch  settings. 

tum_on_initial_8witches:- 

recorded (toggle , svitchea.are.def ined , , 

I 

•  • 

(toggle (translated.granmar .present ) -> 
tum.on(tran8lated_graimnar_in_uae) ; 
true) , 

tum_on(par8e_tree) , 
tum.of f  (selection) , 
ssucceed, 

tum.off  (show.isr) , 
tum.of f  (semantics)  , 
tum.off  (text. mode) , 
tura.off  (siunmary) , 
show.herald. 

tum.on.  initial. switches . 
tum.on.initial. switches. 


Figure  10:  Sample  prolog.ini  file 
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D  PUNDIT  Files  and  Dependencies 

D.l  Files 

Listed  below  axe  the  core  and  domain-specific  files  which  comprise  the  pundit  software 
accompanying  this  document.  By  convention,  domain-specific  files  are  prefixed  with  the 
name  of  the  domain. 

•  Core  Files 
—  Lexical 

*  dictiar.pl  -  the  core  lexicon 

*  entriea.pl  -  the  Lexical  Entry  Procedure 

*  lookup .  pi  -  lexical  lookup 

*  raadar.pl  -  procedures  to  read  input 

*  raadin.pl  -  load  or  update  the  lexicon 

*  8hapaa.pl  -  shape  descriptors 

*  tablaa.pl  -  lexical  entry  options 
-  Syntax 

*  Grammar 

•  bnf.pl  -  bnf  definitions 

•  compila.typea.pl  -  [created  automatically] 

•  computa.typaa  .pi  -  compute  atomic  grammar  nodes 

•  con j  -raatr .  pi  -  grammar  restrictions  for  conjunction 

•  countjoff.pl  -  counting  procedure 

•  countjon.pl  -  counting  procedure 

•  counting.pl  -  procedures  for  grinding  and  counting 

•  intarpratar.pl  -  grammar  interpreter 

•  lapopa.pl  -  elementary  restriction  operators 

■  mata.pl  -  meta  grammar  for  conjunction 

•  path.pl  -  navigate  the  parse  tree 

•  pnma .  pi  -  dynamic  pruning  of  grammar  options 

•  restrictions. pi  -  restrictions 

•  routines.pl  -  basic  syntactic  routines  for  grammar 

•  translated-graiiimar.pl  -  [created  automatically] 

•  translator.pl  -  grammar  translator 

•  types. pi  -  type  definitions  for  grammar 

■  updata.pl  -  grammar  update  procedures 
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•  zor.pl  -  exclusive  or  mechanism  for  grammar  options 

*  Intermediate  Syntactic  Representation 

•  computa_trans.pl  -  compute  tSR 

•  iar-lezical.pl  -  ISR  information  for  terminal  symbols 

•  iar-opa.pl  -  iSR  operator  definitions 

•  semproc.pl  -  simplify  tSR  translation 

•  sho«-iar.pl  -  display  procedures  for  the  iSR 

*  Selection 

•  select ion-dcg. pi  -  Selection  DCG  for  analyzing  iSR 

•  selection-query. pi  -  Selection  user  interface 

•  8electlon.restr.pl  -  restrictions  which  cadi  Selection  DCG 

•  selection-tools. pi  -  Selection  tools 

•  selection-top-level.pl  -  record  and  erase  parsed  sentences 
■  selectionjutilitie8.pl-  Selection  utilities 

-  Semantics 

*  adjunct-analysis. pi  -  analyze  sentence  adjuncts 

*  filter.pl  •  prepare  iSR  for  semantic  analysis 

*  npJLnt.pl  -  noun  phrase  semantics 

*  quantifiers. pi  •  quantifier  binding  procedures 

*  8emantic8.pl  -  the  Semantic  Interpreter 

*  uorld.pl  -  general  knowledge  base  procedures 

-  Pragmatics 

*  discourse-rules. pi  -  manage  discourse  emd  focus  information 

*  np-ezt.pl  -  Reference  Resolution 
+  time.pl  -  Time  Analysis 

-  Database  Application 

*  entry-genarator.pl  -  create  database  relations 

-  Utilities 

*  access. pi  -  iSR  accessor  functions 

*  edit.pl  -  Prolog  Structure  Editor 

*  qprologl5.pl  -  code  specific  to  Quintus  Prolog 

*  rdb -remove .  pi  -  remove  entries  from  recorded  database 

*  8how.pl  -  display  iSR,  IDR,  db  relations,  etc. 

*  switches. pi  -  mamage  pundit  switches 

*  ts8ting.pl  -  software  testing  utility  (not  for  muck 

*  time-display  .pi  -  temporal  relations  display  procedures 
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*  tracajnas8agss.pl  -  semantics  trace  messages 

*  utilitias.pl  -  general-purpose  procedures 

*  vazjnanu8.pl  -  menu  facility 

*  vaz-shov.pl  -  top-level  non- window  display  procedures 

*  s8_support.pl  -  windowing  system  procedures 
—  Other 

*  compilaPundit  -  build  a  pundit  image 

*  daino.topJ.aval.pl- 

*  opjiafs.pl  •  operator  declarations 

*  punt.pl  -  on-line  pundit  help 

*  topJeval.pl-  PUNDIT  front-end 

•  Domain- Specific  Files  for  the  MUCK  Domain 
—  Lexical 

*  muckjdictisr.pl  -  incremental  lexicon 

*  nnickjhapes  .pi  -  shape  descriptors 

-  Syntax 

*  Grammar 

•  coinpile.t3rp«s.pl  -  [created  automatically] 

•  muck Jbnf.pl  -  updates  to  the  core  bnf  file 

•  muck.rastriction8.pl  -  restrictions 

•  translatad.grammaz'.pl  -  [created  automatically] 

*  Selection 

•  muck.salectionjlb.pl  -  selectional  patterns 

•  SELECTIONALJ’ATTERNS.pl  -  [created  automatically  by  Selection] 

•  USER.CORPUS.pl  -  [created  automatically  by  Selection] 

-  Semantics 

«  muckjmlas.pl  -  semantics  rules 

*  muckJiorld.pl  -  the  knowledge  base 

-  Pragmatics 

*  muck.tima.pl  -  temporal  operators  and  rules 
—  Database  Application 

*  muck.antry.generator.pl  -  customized  version  of  core  file 

*  muckjlb.structure.pl  -  database  definition 

*  muckjlbjnapping.pl  -  database  mapping 

-  Summary  Application 
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*  imickjTioniary.pl  -  create  summaries  (empty  file) 

-  Other 

*  compileMuck  -  build  MUCK  image 

*  muck-top_laval.pl  -  message  entry  front-end 

*  nnick-Borking.pl  -  message  corpus 

D.2  Dependencies 

While  most  PUNDIT  files  can  be  loaded  in  any  order,  certain  files  and  classes  of  files  must 
be  loaded  in  a  specific  order  for  pundit  to  run  correctly.  These  ordering  dependencies 
arise  for  three  main  reasons: 

1.  Compilation  of  domain-specific  files  is  designed  to  follow  compilation  of  domain- 
independent  files.  For  example,  certain  core  procedures  may  be  abolished  and  rede¬ 
fined  in  a  domain-specific  file;  if  changes  are  made  to  the  core  file  and  it  is  recompiled 
in  a  domain  image,  the  domain-specific  file  must  be  recompiled  as  well. 

2.  Some  of  pundit’s  data  are  stored  in  the  Prolog  internal  database,  and  multiple 
compilations  of  certain  files  will  result  in  duplicate  database  entries.  The  relevant 
files  are:  the  core  and  domain-specific  versions  of  the  grammar  and  the  lexicon 
(bnf.pl  and  dictisr  .pi),  and  the  domain  selectional  patterns  and  message  corpus. 

3.  Certain  operations  in  PUNDIT  are  performed  at  compile  time.  These  include  meta¬ 
rules  for  the  grammar,  translating  the  grammar,  and  computing  the  types  of  non¬ 
terminals  in  the  grammar.  These-operations  must  be  done  in  order. 

If,  in  the  course  of  development,  you  wish  to  compile  a  new  version  of  the  grammar,  lexicon, 
selectional  database  or  message  corpus,  you  must  first  remove  the  internal  database  entries 
generated  by  the  compilation  of  the  previous  version.  This  can  be  done  most  simply  by 
calling  the  procedure  rdb-remove  (see  Section  4),  which  removes  all  database  entries  of  a 
specified  type. 

Compiling  changes  to  selectional  patterns:  selectional  patterns  reside  in  two  files: 
<domain> -selectionjdb.pl  and  SELECTIONAL-PATTERNS.pl.  The  latter  is  created  au¬ 
tomatically  in  amy  directory  in  which  you  have  run  a  PUNDIT  image  with  the  selection 
switch  on,  while  the  former  resides  in  the  main  domain  directory,  is  maintained  by  hand, 
and  is  compiled  into  the  standard  domain  image.  If  you  wish  to  retain  the  selectional  pat¬ 
terns  which  were  originally  compiled  into  the  image  and  to  add  your  personal  selectional 
patterns,  compile  <domain>-selection-db.pl  and  SELECTIONAL-PATTERNS.pl,  in  that 
order.  Otherwise,  compile  only  the  relevant  file. 

Compiling  changes  to  the  message  corpus:  the  message  corpus  is  not  compiled  into 
either  the  core  PUNDIT  image  or  the  domain  image;  instead,  it  is  automatically  compiled 
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into  your  image  when  you  first  invoke  the  pundit  command.  Therefore,  if  you  have 
modified  this  file,  you  need  not  recompile  it  yourself.  The  system  supports  personal 
versions  of  the  corpus:  if  the  file  <doaain>-vorking.pl  exists  in  the  directory  in  which 
you  are  running  an  image,  that  is  the  file  which  will  compiled.  If  it  does  not  exist,  the 
file  in  the  main  domain  directory  wiU  be  compiled. 

Loading  changes  to  the  lexicon:  multiple  lexicon  files  exist.  The  core  pundit  lexicon 
(dictisr.pl)  resides  in  the  core  pundit  directory  and  is  incorporated  into  the  core  pun¬ 
dit  image;  the  domain-specific  lexicon  (<domain>jdictisr.pl)  resides  in  the  domain 
directory  and  is  incorporated  into  the  domain  imag';.  Since  domain  images  are  built  from 
core  images,  a  domain  image  contains  lexical  entries  from  both  the  core  lexicon  and  the 
domain  lexicon,  loaded  in  in  that  order.  In  addition,  you  may  have  one  or  more  personal 
lexicon  files  created  by  using  the  Lexical  Entry  Procedure.  By  running  rdb_remov«  to 
remove  lexical  entries,  you  will  have  removed  all  lexical  entries,  regardless  of  the  file  in 
which  they  originated.  You  will  now  need  to  use  the  readln  procedure,  amd  load  the 
relevant  lexicon  files  in  sequence. 

Implementing  changes  to  the  grammar: 

1.  Read  in  new  grammar  file 

2.  Meta- Rules — run  gen.conj/0. 

3.  Translate  the  grammar  to  Prolog — run  tran8late.grafflmar/l,  whose  argument  is  a 
file  name  (generally  translated-graismar.pl). 

4.  Compile  the  translated  grammar — compile  the  file  named  above. 

5.  Compute  the  types  of  the  grammar  nonterminals — compile  the  file  computa-t3rpes  .pi. 

These  steps  must  be  performed  in  the  order  listed,  except  that  step  5  may  be  performed 
any  time  after  step  2.  Step  2  may  be  skipped  if  you  do  not  wish  to  parse  sentences 
containing  conjunction.  Skip  both  steps  3  and  4  if  you  wish  to  parse  with  the  grammar 
interpreted  (at  a  significant  performance  loss).  Generally  speaking,  you  will  always  need 
to  recompile  compute-types. pi. 

Compiling  changes  to  files  which  do  not  update  the  recorded  database  :  certain 
files  exist  in  core  and  domain-specific  versions  (e.g.  shapes.pl  and  <muck>-8hape8 .pi). 
The  core  versions  reside  in  the  core  pundit  directory  and  are  incorporated  into  the  core 
PUNDIT  image;  the  domain-specific  versions  reside  in  the  domain  directory  and  are  incor¬ 
porated  into  the  domain  image.  Since  the  domain  image  is  built  from  the  core  image, 
domain-specific  files  are  compiled  on  top  of  core  files.  If  you  are  working  in  a  domain 
image  and  have  changed  a  file  which  exists  in  both  core  and  domain-specific  versions,  you 
will  need  to  recompile  both,  in  that  order.  Otherwise,  simply  recompile  the  relevant  file. 
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1  Introduction 


1.1  Features 

The  Lexical  Entry  Procedure  (lep)  has  been  designed  to  provide  consistency,  completeness,  and 
speed  of  entry  for  new  words.  The  procedure  elicits  relevant  linguistic  information  from  the  user, 
computes  dependencies  between  attributes,  and  prompts  for  morphologically  related  forms  (offer¬ 
ing  a  “guess”  as  to  the  correct  form).  The  program  then  automatically  creates  a  set  of  related 
dictionary  entries,  with  as  much  structure-sharing  among  the  entries  as  possible.  Before  the  entries 
are  actually  entered  in  the  database  or  written  to  a  file,  the  user  may  inspect  and  edit  any  entries 
created. 

1.2  Limitations 

The  LEP  is  a  tool  which  relieves  the  user  of  some,  but  not  all,  of  the  burden  of  maintaining  a 
lexicon.  In  its  current  version,  it  can  only  be  used  to  add  new  lexical  entries,  and  cannot  be  used 
to  revise,  delete,  or  display  existing  lexical  entries.  Furthermore,  it  does  not  directly  access  a 
lexicon:  rather,  it  adds  lexical  entries  to  the  Prolog  database  in  a  running  image,  and  optionally 
copies  them  to  a  temporary  file.  The  user  must  move  the  entries  from  the  temporary  file(s)  created 
by  the  lep  to  the  appropriate  lexicon. 

What  this  means  to  you,  as  a  user,  is  that  you  will  need  to  become  familiar  with  the  LEP  (as 
documented  in  this  Guide),  and  you  will  also  need  to  understand  the  tools  for  editing  and  deleting 
raw  lexical  entries.  This  in  turn  means  that  you  will  need  to  understand  the  structure  and  content 
of  lexical  entries  in  the  form  in  which  they  are  stored  in  a  lexicon,  for  example  (for  the  word  dog): 

: (dog, root ; dog, [n: [11, singular] ,11: [ncountl]]} 

: (do(  j, root : dog, [n: [11 , plural]]} 

: (dog’ 8, root: dog, [ns; [11 , singular] ] ) 

: (dogs’ , root: dog, [ns; [11 , plural]  ] ) 

To  help  you  with  this  task,  we  have  included  a  number  of  sample  lexical  entries  created  by  the 
LEP,  and  have  included  a  brief  section  on  what  to  do  when  you  have  completed  the  LEP. 


2  Getting  Started 

2.1  How  to  access  the  LEP 

The  LEP  may  be  run  by  itself,  or  may  be  called  automatically  whenever  PUNDIT  encounters  an 
unknown  word. 

•  Standalone 

Simply  type  the  command  Isp.  at  the  Prolog  prompt  in  a  pundit  image.  You  will  be 
prompted  for  the  word  whose  definition  you  wish  to  add. 
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•  During  text  processing 

If  you  have  set  the  pundit  switch  «nt«rji«v-Bord^  to  on,  the  LEP  will  be  invoked  automat¬ 
ically  whenever  pundit  encounters  an  unknown  word  in  its  input.  When  this  happens,  you 
will  be  given  the  opportunity  to  respell  the  word,  to  add  the  word  to  the  lexicon,  or  to  abort 
processing.  If  you  chose  to  add  the  word,  the  LEP  will  be  invoked.  After  you  have  completed 
the  LEP,  your  definitions  will  be  added  to  the  image,  and  pundit  will  resume  processing  with 
the  new  definitions. 

The  transcrript  below  illustrates  the  first  two  options:  respell  and  add.  The  input  is 
Ticonderoga  attaked,  where  attaked  is  a  misspelling,  and  ticonderoga  is  a  proper  name 
which  is  not  in  the  lexicon.  The  actual  prompts  which  come  up  during  lexical  entry  will  be 
discussed  in  detail  shortly.  For  now,  observe  that  in  the  illustration,  PUNDIT  hats  successfully 
parsed  the  input  after  the  spelling  error  was  corrected  and  ticonderoga  was  defined. 

I  ?-  tum_on(enter_nes_Bord) . 

yes 

I  ?-  parse. 

sentence:  ticonderoga  attaked. 

» 

So  Lexical  Entries  tound  lor:  ticonderoga 
Choose  one  of  the  lollosing  items 

1.  respell  2.  add  3.  abort 

Lexical  Entry  option:  add 

Defining  the  lexical  entries  for  'ticonderoga'. 

Output  to  a  file?  [yes]  : 

Entries  vill  be  saved  in  the  file  muck_lexicon.pl . 15Iovl956 

Root  form  [ticonderoga] : 

Other  spellings  [none] : 

Word  classes:  name 

Defining  'ticonderoga'  as  a  proper  noun 
Singular  possessive  [ticonderoga's] : 

The  folloving  lexical  entries  have  been  created; 

: (ticonderoga, root:ticonderoga, [proper: []]) 

*  For  more  information  on  PUNDIT  switches,  please  consult  [Ball  88). 
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:  (ticonderoga's.root.-ticonderoga.  [na:  Eli  .singular]]  ) 
Enter?  -.yes 

Ho  Lexical  Entries  found  lor:  attaked 
Choose  one  ol  the  following  iteas 

1.  respell  2.  add  3.  abort 

Lexical  Entry  option:  respell 
respell:  attacked 

continuing  processing  with  respelled  word(s)  —  [attacked] 


2.2  General  conventions 

2.2.1  Meta-responses 

All  LEP  prompts  accept  meta-responses,  which  begin  with  the  special  character  and  end  with 
a  period. 

•  flhelp.  -  ask  for  help. 

•  €help(<ITEM>)  .  -  ask  for  help  on  a  menu  item. 

•  Cquit.  -  abandon  the  current  definition. 

•  eprolog(<PROLOG  COMMAND>)  .  -  execute  Prolog  command. 


2.2.2  Defaults 

Many  of  the  prompts  in  the  LEP  offer  defaults.  In  a  menu,  the  default  is  marked  with  an  asterisk; 
otherwise,  the  default  is  enclosed  in  square  brackets.  To  accept  a  default,  press  the  RETURN 
key.  Otherwise,  enter  your  response.  In  the  following  example,  the  user  has  overridden  the  default 
plural  form  for  goose,  but  accepted  the  default  singular  possessive  form. 


Plural  form  [gooses] :  geese 
Singular  possessive  [goose's]: 


2.2.3  Menus 

Certain  prompts  require  a  response  from  a  fixed  list  of  choices;  these  choices  are  shown  as  a  menu 
when  you  ask  for  help.  There  are  two  basic  types  of  menus:  those  from  which  you  can  select  only 
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one  item,  and  those  which  allow  you  to  select  multiple  items.  The  menu  title  will  indicate  which 
>«  the  Ccise. 


If  the  menu  requires  a  single  item  as  a  response,  you  may  enter  either  the  number  of  the  item,  or 
the  name  of  the  item.  If  the  menu  allows  multiple  items,  you  may  enter  the  numbers  or  the  names, 
separated  by  commas.  In  the  following  example,  the  user  is  selecting  the  word  classes  for  the  word 
fool,  which  is  both  a  noun  and  a  verb: 


Word  claases:  Ohelp. 


Choose  one  or  more 

of  the 

follosing 

1 .  noun 

2. 

name 

3.  verb 

4.  adjective 

B. 

adverb 

6.  determiner 

7.  quantifier 

8. 

preposition 

Select:  noun. verb 

Note  that  the  user  could  also  have  entered  1,3  instead. 
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3  Defining  New  Words 

3.1  Before  You  Begin 

The  LEP  assumes  that  you  are  adding  new  lexical  entries,  and  it  does  not  check  to  ensure  the 
entries  are  not  already  in  the  image  you  are  running.  If  you  want  to  be  able  to  test  your  entries 
by  parsing  with  them  in  this  image,  you  should  first  verify  that  the  word  you  wish  to  enter  is  not 
already  defined.  Do  this  by  using  the  procedure  edit_«ord,  giving  your  word  as  argument.  If 
the  word  has  already  been  defined,  this  procedure  will  display  the  existing  lexical  entries.  If  you 
intend  to  completely  replace  them,  delete  each  one.  If  you  are  sufficiently  proficient,  you  can  use 
edit-«ord  to  make  any  necessary  revisions  instead  of  using  the  LEP  (but  there  will  be  no  externa! 
record  of  your  revisions). 

If  your  word  has  not  already  been  defined,  or  if  you  are  not  concerned  about  creating  duplicates, 
proceed  with  the  LEP. 

3.2  Initial  Prompts 

The  first  five  prompts  in  the  LEP  are  common  to  all  lexical  entries: 

♦*♦♦♦♦♦♦♦***♦***♦*♦*♦♦♦*♦*♦♦♦♦  Lexical  Entry  **♦*•**•**•*•*♦*****•••**♦•♦•* 


Word:  fool 


Defining  the  lexical  entries  for  'fool'. 

Output  to  a  file?  [yes]  : 

Entries  vill  be  saved  in  the  file  muck_lexicon.pl. IBHov 1956 

Root  form  [fool] ; 

Other  spellings  [none] : 

Word  classes:  noun, verb 

•  Word 

Enter  the  word  which  you  wish  to  add.  What  you  enter  will  serve  as  the  default  for  the  Root 
form  prompt,  but  plays  no  other  role  at  present.  The  LEP  is  designed  entirely  around  root 
forms. 

•  Output  to  a  file? 

Answer  yes  if  you  wish  to  save  your  definitions  into  a  file  in  the  current  working  directory 
(the  name  of  the  file  is  automatically  generated).  If  you  answer  no,  your  definitions  will  be 
recorded  in  the  image  you  are  running,  but  there  will  be  no  external  record.  In  short,  your 
definitions  will  be  lost  when  you  exit  the  image. 


5 


•  Root  lorn 


The  root  is  the  most  basic  form  of  the  word  in  the  same  grammatical  category.  If  the  word 
is  a  verb,  the  root  is  the  infinitive  form;  if  the  word  is  a  noun,  it  is  the  singular  form.  It  tne 
word  you  are  defining  is  an  abbreviation,  enter  the  full  form  (e.g.  for  lb,  enter  pound).  If  the 
word  is  a  variant  spelling,  enter  what  you  consider  to  be  the  standard  spelling.  Otherwise, 
the  root  form  is  generally  identical  to  the  word  itself. 

Special  problems  arise  when  you  need  to  define  the  root  form  of  an  acronym  (e.g.  unodtr  for 
unless  otherwise  dincied)  or  a  root  which  contains  hyphens  (anti-aircraft),  or  an  idiomatic 
phrase  (go  sinker).  These  are  all  treated  as  ‘multi-word  expressions’,  and  this  version  of  the 
LEP  cannot  handle  them.  You  will  need  to  consult  with  a  PUNDIT  expert  to  determine  how 
to  enter  them  (manually)  into  a  lexicon. 

•  Othar  spallings 

Here,  you  can  specify  any  variant  spellings  or  abbreviations  for  the  root,  such  as  sep,sept  for 
September,  or  archaeology  for  archeology. 

It  is  important  to  note  that  these  are  other  forms  of  the  root.  Special  problems  arise  when 
you  wish  to  define  a.;  abbreviation  for  a  non-root  form,  for  example  cleamg  as  an  abbreviation 
for  clearing,  or  lbs  for  pounds.  The  LEP  cannot  hanc'^,  ‘hese  cases,  amd  you  will  need  to  consult 
with  a  PUNDIT  expert. 

•  Word  classes 

Enter  the  part(8)  of  speech  of  the  root.  For  example,  if  the  root  can  be  both  a  noun  and  a 
verb,  enter  noun, verb. 

The  next  sections  cover  major  word  classes  and  their  features  in  detail.  The  diagram  below  shows 
the  features  and  morphological  information  which  are  collected  for  each  word  class.  Items  enclosed 
in  {  )•  are  optional,  while  items  enclosed  in  <  >  reflect  information  that  the  user  may  or  may  not 
be  asked  to  provide,  depending  on  previous  choices. 
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Definite  vs. 
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I 

1.  3rd  person  singular 

2.  past  tense 

3.  past  participle 

4.  present  participle 


<prepositions>  <particles> 


3.3  Word  Classes 
3.3.1  Nouns 

A  noun  is  first  classified  as  mass  or  count.  If  the  noun  is  a  count  noun,  the  lep  prompts  for  number 
information  and  plural  form  (it  is  assumed  that  the  toot  is  singular).  For  both  count  and  mass 
nouns,  you  will  then  be  asked  to  specify  the  possessive  forms.  Sample  definitions  for  woman  (a 
count  noun)  and  mud  (a  mass  noun)  are  given  below. 

Word  classes:  noun 

Defining  ’ woman ’  as  a  noun 

Count/Mass  [count] : 
lumber  [singular] : 

PIutaJ.  form  [womans] :  women 
Singular  possessive  [woman’s]: 

Plural  possessive  [womans’]:  women’s 
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The  lolloviag  lexical  entries  have  been  created: 

: (wcrsn.rootTvoaan.  [n: [11 .singular] ,11: [ncountl]]) 

: (eomen, root: soman, [n: [11 .plural]] ) 

: (soman's, root: soman,  [ns: [11, singular]]) 

: (somen’s .root : soman, [ns: [11 .plural]] ) 

Enter?  :yes 

Note  that  there  is  a  lexical  entry  for  each  morphological  variant.  Each  lexical  entry  consists  of  the 
citation  form,  followed  by  the  root  form,  followed  by  a  list  of  lexical  classes  and  their  attributes. 
This  is  data  which  is  intended  to  be  recognized  by  the  parser.  In  the  first  entry,  n  indicates  a  noun, 
and  ncount  indicates  a  count  noun.  In  the  entries  for  woman’s  and  women’s,  ns  indicates  possessive. 
The  occurrence  of  [11, . . .]  in  the  entries  is  a  pointer  to  the  basic  feature  11:  [ncountl]  in  the 
first  entry,  where  the  root  is  classified  as  a  count  noun.  You  may  find  it  enlightening  to  consult 
[Fitzpatrick  81]  for  a  more  detailed  discussion  of  word  classes  such  as  ncountl  in  the  context  of  a 
related  system. 

A  sample  definition  for  the  mass  noun  mud: 

Vord  classes;  noun 

Detining  'mud'  as  a  noun 

Count/Mass  [count] :  mass 
Singular  possessive  [mud's]: 

3.3.2  Proper  Nouns 

You  will  be  asked  to  specify  the  singular  possessive  form.  Example: 

Word:  Philadelphia 

Defining  the  lexical  entries  for  'Philadelphia' . 

Output  to  s  file?  [yes]  :n 
Root  form  [philsulelphia]  : 

Other  spellings  [none]:  philly.phila 
Word  classes:  name 

Defining  'Philadelphia'  as  a  proper  noun 
Singular  possessive  [Philadelphia's]: 

Note  that  the  name  has  been  entered  in  lowercase  letters.  If  it  had  been  capitalized,  the  LEP  input 
reader  would  have  converted  it  to  lowercase  anyhow.  This  is  because  PUNDIT  in  general  converts 
all  input  to  lowercase  letters,  so  it  would  be  useless  to  allow  the  distinction  to  be  made  in  the 
lexicon. 
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3.3.3  Verbs 


Donning  the  characteristics  of  a  verb  is  perhaps  the  most  daunting  of  all  lexical  entry  tasks. 
PUNDIT  requires  very  detailed  information  about  what  types  of  complement  a  verb  can  take,  and 
what  prepositions  and  particles  the  verb  requires.  This  information  is  necessary  to  get  correct 
pMses  and  avoid  incorrect  parses,  but  it  is  difficult  to  specify.  Many  of  the  distinctions  amongst 
complement  types  may  be  obscure  to  the  non-linguist,  but  they  are  all  significant.  You  may  find 
it  useful  to  consult  a  dictionary  such  as  Longman’s  for  guidance.  Since  it  requires  a  great  deal  of 
thought  to  determine  the  complement  types  of  a  verb,  you  may  find  it  most  efficient  to  work  this 
out  on  paper,  before  using  the  lep  to  record  your  decisions. 

We  have  tried  to  simplify  the  task  of  specifying  the  complement  types  of  a  verb  by  offering  three 
different  menus;  a  menu  for  transitive  uses  of  the  verb,  a  menu  for  intransitive  uses  of  the  verb, 
Md  a  menu  for  verbs  which  take  clausal  complements.  The  items  in  each  menu  are  numbered,  and 
you  must  choose  one  or  more  items  by  number. 

Within  each  menu,  you  will  be  shown  first  the  String  Grammar  name  for  the  complement  type, 
and  then  a  short  description.  You  may  request  help  on  any  item  by  typing  ehalp(IUNBER)  . ,  where 
IUM3SR  is  the  number  of  the  item  on  the  menu.  The  help  messages  give  examples  of  verbs  which 
lake  this  complement  type,  and  some  criteria  for  making  a  decision.  All  of  the  complement  types 
are  discussed  in  more  detail  in  [Linebarger  88],  which  is  attached  as  an  appendix  to  this  guide. 

After  you  have  specified  the  complement  types,  you  will  be  asked  about  tense  and  participial  forms 
of  the  verb.  A  sampie  definition  for  the  verb  think  is  shown  on  the  following  pages. 

Word  classas;  v«rb 

Defining  'think'  as  a  verb 

Takas  a  clausal  coaplamant?  ryaa 

Choosa  clausal  complainant  types ,  by  number; 

1.  tovo  -  inlinitivaJ.  complement,  raising-to-subject 

2.  aqtovo  -  infinitival  complement,  subject-controlled  equi 

3.  objtovo  -  noun  phrase  ■*’  infinitival  complement,  object-controlled  equi 

4.  ntovo  -  noun  phrase  *■  infinitival  complement,  raising-to-object 

6.  thats  -  'that '-clause 

6.  assertion  -  'that '-clause,  but  'that'  is  optional 

7.  nthats  -  noun  phrase  +  'that '-clause 

8.  pnthats  -  prepositional  phrase  +  'that ’-clause 

9.  svo  -  tenseless  clause  uith  no  complementizer 

10.  clshould  -  'that '-clause  +  subjunctive 

11.  pnthats vo  -  prepositional  phrase  -*■  clshould 

12.  snvh  -  indirect  question 

13.  nsnuh  -  noun  phrase  *  indirect  question 

14.  sven  -  predicative  'small  clause’ 

15.  sobjbe  -  small  clause  with  subject 

16.  dpsn  -  particle  +  clause 
Clausal  complement  types:  5,6,15 
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Tremsitive?  ;y 


Choose  transitive  complement  types,  by  number: 

1.  nstgo  -  noun  phrase  (simple  transitive  verb) 

2.  npn  -  noun  phrase  +  prepositional  phrase 

3.  pnn  -  prepositional  phrase  +  noun  phrase 

4.  nn  -  double  object  dative 

5.  na  -  noun  phrase  adjective  phrase 

6.  dp2  -  particle  +  noun  phrase 

7.  dp3  -  no\m  phrase  +  particle 

8.  dp2pn  -  particle  +  noun  phrase  +  prepositional  phrase 

9.  dp3pn  -  noun  phrase  +  particle  +  propositional  phrase 
Transitive  complement  types:  6,7 

Intransitik :y 

Choose  intransitive  complement  types,  by  number: 

1.  nullobj  -  no  complement  (simple  intransitive  verb) 

2.  pn  -  prepositional  phrase 

3.  astg  -  adjective 

4.  dstg  -  takes  specific  adverbs 
6.  dpi  -  peo'ticle 

6.  dplpn  -  particle  +  prepositional  phrase 
Intransitive  complement  types:  1,2 

Predicative  verb?  [no]  : 

Prepositions  lor  the  PN  complement:  about 
Particles  lor  the  DP2  complement;  up 
Particles  lor  the  DP3  complement;  up 

3rd  person  singular  [thinks] : 

Past  tense  [thinked] :  thought 
Past  p2u:ticiple  [thought]  : 

Present  participle  [thinking] ; 

The  lolloving  lexical  entries  have  been  created: 

: ( think, root : think, [v: [12]  ,tv: [12, plural] , 

12: [objlist; [nullobj ,thats, assertion, 8objbe,pn: [pval : [about] ] , 
dp2 : [dp val ; [up] ] , dp3 : [dpval : [up] ] ] ] ] ) 

: (thinks, root: think, [tv; [12, singular]]) 

: (thought, root; think, [tv; [12, past] ,ven: [14]  , 

14; [12,pobjlist : [assertion,objbs,thats,dpl : [dpval: [up]] , 
p : [pval : [about] ] ] ] ] ) 

: (thinking,root : think, [ving: [12]] ) 

Enter?  :yes 
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3.3.4  Adjectives 


For  adjectives,  PUNDIT  needs  to  know  whether  the  adjective  can  take  a  clausal  complement;  if  it 
can,  which  of  four  complement  types.  Help  is  available  on  each. 

•  ordincury 

Takes  a  Mo<-clause  as  a  right  modifier,  e.g.  /  am  glad  that  she  won,  and  the  syntactic 
subject  of  the  sentence  is  also  the  logical  subject.  The  String  Grammar  name  for  this  class 
of  adjective  is  asentS  (this  is  what  you  will  see  in  the  lexical  entry  created). 

•  extraposition 

It... adjective  that...,  e.g.  It  is  obvtous  that  he  is  tired..  The  logical  subject  is  the  Mat-clause, 
which  appears  to  have  been  extraposed  to  the  right,  leaving  it  behind.  For  adjectives  of  this 
type,  there  will  often  be  acceptable  versions  with  and  without  extraposition:  It  ts  obvtous 
that  he  is  tired  and  That  he  is  tired  is  obvious.  The  String  Grammar  name  for  this  class  of 
adjective  is  asentl. 

•  equi 

Like  the  ordinary  complement  type,  except  that  the  clause  is  infinitival  instead  of  a  that- 
clause,  and  the  subject  of  the  sentence  is  also  the  understood  subject  of  the  infinitive.  Ex¬ 
ample:  Bill  is  eager  to  please,  which  means  that  Bill  wants  to  do  the  pleasing.  The  internal 
name  for  this  class  of  adjective  is  aasp;  [aqui^dj]. 

•  raising 

Like  extraposition,  except  that  the  clause  is  infinitival,  and  the  subject  appears  to  have 
been  raised  out  of  the  clause.  Example:  She  is  certain  to  be  re-elected.  The  logical  subject 
is  the  clause,  with  the  syntactic  subject  put  back  into  it:  That  she  wtll  be  re-elected  is 
certain,  It  is  certain  that  she  will  be  re-elected.  The  internal  name  for  this  class  of  adjective 
is  aasp:  [raising^dj] . 


3.3.5  Adverbs 

No  special  information  is  collected  for  adverbs. 


3.3.6  Determiners 

Determiners  (articles)  are  classified  according  to  definiteness  and  number:  for  example,  a  is  in¬ 
definite  and  singular,  while  the  is  definite  and  both  singular  and  plural.  A  sample  definition  for 
another,  which  is  indefinite  and  singular: 

Word  classes;  determiner 

Defining  'another’  as  a  determiner 

Definiteness:  indefinite 
Humber  [singular] : 
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3.3.7  Quantifiers 

Quantifiers  are  claissified  according  to  number.  A  sample  definition  for  many  is  given  below; 

Word  classes:  quantifier 

Defining  'many'  as  a  quantifier 
Humber  [singular] :  plural 

3.3.8  Prepositions 

No  special  information  is  collected  for  prepositions. 

3.4  Completing  the  Lexical  Entry  Process 

After  the  LEP  has  collected  all  attributes  of  the  root,  it  will  display  the  lexical  entries  created,  and 
you  will  be  asked  v/hether  you  wish  to  enter  them.  If  at  this  point  you  realize  that  you  have  made 
an  error  in  lexical  entry,  you  can  simply  type  Cquit .  and  stairt  over  again  -  nothing  hau  yet  been 
saved  to  a  file  or  added  to  the  Prolog  database. 

If  you  answer  yss,  the  entries  will  be  recorded  in  the  current  image  (and  written  to  a  file,  if  you 
so  specified).  You  will  then  be  given  the  opportunity  to  define  more  words. 

If  you  answer  no,  you  will  be  shown  each  of  the  entries  created,  one  at  a  time,  and  you  may  choose 
to  enter  it,  ignore  it,  or  edit  it.  At  this  point  you  can  also  quit  and  start  over  again.  Note  that  no 
action  is  taken  until  one  of  these  choices  is  made  for  each  of  the  entries.  If  you  choose  ignors,  the 
entry  will  be  thrown  away.  If  you  choose  edit,  you  will  enter  the  Prolog  Structure  Editor.  This 
is  a  tool  which  requires  some  expertise,  since  you  will  be  directly  editing  the  raw  lexical  entry  - 
consult  [Riley  86]  for  more  details.  If  you  get  into  the  Structure  Editor  by  mistake,  type  ?  to  see 
the  options  -  one  of  these  will  be  a  to  abort. 


4  Beyond  Lexical  Entry 

If  you  answered  yas  to  Output  to  a  file?,  the  lexical  entries  you  created  will  now  be  in  a 
temporary  file  in  your  directory,  and  you  will  need  to  eventually  move  them  into  the  appropriate 
lexicon  file.  This  must  be  done  manually. 

Before  you  move  your  lexical  entries  to  a  more  permanent  home,  however,  it  is  advisable  to  test,  by 
devising  test  sentences  and  using  PUNDIT  to  parse  them.  In  the  sections  which  follow,  we  describe 
how  to  test  and  correct  errors  at  three  different  stages;  before  exiting  your  current  image,  after 
exiting  the  image  but  before  moving  the  entries  to  a  lexicon,  and  after  moving  the  entries.  The 
well-known  rule  applies  here;  the  earlier  you  detect  an  error,  the  easier  it  is  to  fix  it. 
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4.1  Testing  Your  Lexical  Entries  Before  Exiting  the  Image 

After  you  have  completed  the  LEP,  the  lexical  entries  which  you  created  have  been  stored  in  the 
Prolog  database  and  are  thus  available  to  the  PUNDIT  parser.  This  is  the  most  convenient  stage  at 
which  to  test  the  correctness  and  completeness  of  your  entries. 

Before  you  begin  parsing,  there  are  at  least  three  pundit  switches  you  may  wish  to  adjust: 
textjnode,  peorse.tree  and  semantics.  Turn  the  text  jnode  switch  off  -  this  will  enable  you 
to  obtain  all  the  parses  for  your  input.  Turn  the  parse.tree  switch  on  -  so  that  you  can  see 
exactly  how  puNDiT  has  analyzed  your  input.  The  semantics  switch  you  may  wish  to  turn  off:  if 
you  have  defined  a  verb,  semantic  analysis  will  not  work  properly  until  you  have  also  defined  the 
semantics  rules  for  the  verb;  if  you  have  defined  a  noun,  semantic  analysis  will  not  work  properly 
until  you  have  defined  the  corresponding  concept  in  the  knowledge  base. 

As  a  result  of  testing,  you  may  find  that  some  aspect  of  your  lexical  entry  was  incorrect  or  incom¬ 
plete.  At  this  point  there  are  several  ways  in  which  you  can  correct  your  error. 

1.  Exit  the  image  and  start  all  over  again  from  scratch. 

2.  Use  adit  jrord  to  delete  the  entire  set  of  lexical  entries  from  the  Prolog  database,  and  use 
the  LEP  to  redefine  them. 

3.  Use  edit-Bord  to  revise  the  offending  lexical  entries.  This  option  is  not  recommended  for 
novices. 

[Riley  86]  explains  how  to  use  adit  jrord,  but  here  is  a  simple  example  showing  how  to  delete  all 
the  lexical  entries  for  a  given  root. 

1  ?-  adit_Bord(dog) . 

Editing  a  sat  of  words  with  tha  sama  root 

Word  1:  : (dog. root ;dog, [n:  [11 .singular! ,11: [ncountl]]) 

Word  2:  : (dog's, root;dog, [ns: [11, singular]]) 

Word  3:  : (dogs .root :dog, [n: [11 .plural]]) 

Word  4:  : (dogs' ,root:dog, [ns: [11, plural]]) 

Commzuid :  d4 

Word  nufflbar:4  is  markad  to  ba  dalatad  You  may  no  longer  adit  it. 

Editing  a  sat  of  words  with  tha  same  root 

Word  1:  : (dog.root: dog, [n:  [11, singular] ,11: [ncountl]]) 

Word  2:  : (dog's, root:dog, [ns: [11, singular]]) 

Word  3:  : (dogs .root : dog, [n: [11 .plural]] ) 

Word  4:  : (dogs' ,root:dog, [ns: [11, plural]]) 

Commcmd :  d3 

Word  number : 3  is  marked  to  ba  delated.  You  may  no  longer  edit  it. 

Editing  a  sat  of  words  with  tha  sama  root 

Word  1;  : (dog.root : dog, [n:  [11 , singular] , 11 : [ncountl]] ) 

Word  2:  : (dog's, root:dog, [ns: [11. singular]]) 
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Word  3:  : (dogs, root; dog, [n: Cl 1. plural]]) 

Word  4:  : (dogs’ ,root:dog, [ns: [11, plural]]) 

Command:  d2 

Word  niUDber:2  is  marked  to  be  deleted.  You  may  no  longer  edit  it. 

Editing  a  set  of  sords  sith  the  same  root 

Word  1:  : (dog, root:dog, [n: [11, singular] ,11: Cncountl]]) 

Word  2:  : (dog’s, root; dog, [ns: [11, singular]]) 

Word  3 ;  :  (dogs ,  root :  dog ,  [n :  [  1 1 ,  plur2a.]  ]  ) 

Word  4:  ;  (dogs’ , root: dog,  [ns:  [11, plureQ.]]) 

Command:  dl 

Word  number:!  is  marked  to  be  deleted.  You  may  no  longer  edit  it. 

Editing  a  set  of  vords  sith  the  same  root 

Word  1:  : (dog, root; dog, [n: [11 , singular] ,11: [ncountl]] ) 

Word  2:  : (dog’s, root: dog, [ns: [11. singular]]) 

Word  3:  ; (dogs , root : dog, [n; [11 , plural] ] ) 

Word  4;  : (dogs ’, root :dog, [ns : [11 , plural]] ) 

Command :  t 

Do  you  want  to  delete:  ; (dog, root :dog, [n: [11 , singular] , 11 : [ncountl]] ) . 

Enter  ’y’  or  ’n’ :  y 

Do  you  want  to  delete;  : (dog’s, root :dog, [ns; [11 .singular]] ) . 

Enter  ’y’  or  ’n’:  y 

Do  you  want  to  delete:  : (dogs .root :dog, [n: [11 .plural]] ) . 

Enter  ’y’  or  ’n’ :  y 

Do  you  scmt  to  delete;  ; (dogs’ .root :dog, [ns: [11 .plural]]) . 

Enter  ’y’  or  ’n’:  y 

4.2  After  Exiting 

After  you  have  exited  from  the  image  in  which  you  were  using  the  LEP,  your  entries  will  now  reside 
only  in  the  temporary  file  created  by  the  LEP.  At  this  point  you  may  wish  to  test  them  (if  you 
have  not  already  done  so),  or  you  may  wish  to  load  them  into  an  image  for  some  other  purpose. 

To  load  your  lexical  entries  from  a  file  into  an  image,  use  the  procedure  readin,  whose  argument 
is  the  name  of  the  file  containing  your  lexical  entries.  For  example; 

1  ?-  rsadln( ’muck_lexicon.pl. 18Hovl918')  . 

If,  however,  some  of  your  lexical  entries  are  intended  to  replace  definitions  which  are  already  in 
the  image,  you  should  first  remove  the  old  lexical  entries.  To  do  this,  you  cam  use  edit.word  and 
delete  them  one  at  a  time.  Or  if  you  are  not  sure  which  words  are  already  defined,  you  can  remove 
adl  lexical  entries  from  the  image  by  using  the  procedure  rdb_remov6(dict)  (see  [Ball  88]  for  more 
information). 
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If  you  discover  errors  in  your  lexical  entries  at  this  stage  (that  is,  while  your  entries  are  still  in  a 
temporary  file  created  by  the  LEP),  you  can  simply  remove  the  temporary  file,  delete  the  entries 
from  your  image  (using  edit-sord)  and  use  the  LEP  to  re-enter  your  definitions. 

You  may  also  discover  errors  in  lexical  entries  after  they  have  been  moved  to  a  lexicon  file,  and 
that  file  hats  been  used  to  build  an  image.  If  the  image  is  your  own  personal  image,  you  can  simply 
rebuild  it  after  fixing  the  problem  (and  updating  the  lexicon).  If  the  image  is  shared,  you  may 
need  to  follow  whatever  system  administration  procedures  obtain  at  your  site. 
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The  Lexical  Entry  Procedure  (lep)  was  extensively  revised  in  Version  1.1  to  improve  ease  of  use  by 
non-experts.  In  addition,  minor  bugs  were  corrected,  and  several  obsolete  prompts  were  removed. 

In  Version  1.2,  a  new  lexicon  display  facility  has  been  added,  and  the  Lexical  Entry  Procedure  has 
been  extended  to  allow  the  entry  of  ‘multi- word  expressions’.  These  changes  are  described  below. 


1  Lexical  Display 

In  order  to  make  the  PUNDIT  lexicon  accessible  to  the  ordinary  user,  a  lexical  display  facility  has 
been  developed.  In  this  version  of  pundit,  the  display  is  accessed  by  the  SRE,  and  can  also  be  run 
stand-alone,  using  the  following  Prolog  commands; 

•  lex-displayjall. 

Use  this  command  to  display  the  entire  lexicon  in  the  current  image. 

•  laxjlisplayjalKWordClass) . 

Use  this  command  to  display  the  definitions  of  edl  words  in  a  specified  word  class.  For 
example,  to  display  all  the  verbs  in  the  current  lexicon,  type; 

l«zjdisplay^l(v«rb) . 

•  lezudisplayCWord) . 

Use  this  command  to  display  the  definition  for  a  single  word.  If  the  word  is  a  root  form 
(for  example,  the  infinitive  of  a  verb  or  the  singular  form  of  a  noun),  all  the  variants  of  the 
root  form  will  be  displayed.  Otherwise,  only  information  for  the  particular  word  form  will 
be  displayed. 

For  example,  the  display  for  the  root  form  attack: 
attack  [noun, verb] 

n.  count  singulair;  pi.  attacks;  sing.  poss.  none;  pi.  poss.  none 
V.  present  sing,  attacks,  pi.  attack;  past  attacked 
past  part,  attacked;  pres.  part,  attacking 
vt. 

nstgo  -  They  attacked  it 

npn  -  They  attacked  it  [with]  something 

vi. 

nullobj  -  It  attacked 

pn  -  They  attacked  [on]  something 

The  display  for  the  non-root  form  attacked: 

attacked  (root:  attack)  [verb] 

V.  [past, past,  part] 
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The  displays  show  the  definition  of  a  word  in  what  is  intended  to  be  a  helpful  and  legible  format, 
using  essentially  the  same  terminology  as  the  Lexical  Entry  Procedure.  For  verb  complement  types 
(such  as  nstgo  in  the  example  above),  templates  are  used  to  generate  example  sentences.  Into 
these  templates  are  inserted  the  past  tense  form  of  the  verb  (e.g.  attacked)  and  any  prepositions 
or  particles  which  were  specified.  The  latter  appear  in  brackets  in  the  examples  (e.g.  [with]  was 
specified  as  the  valid  preposition  for  the  npn  complement  of  attack). 

The  current  pundit  lexicon  contains  a  number  of  lexical  integrity  problems:  for  example,  some 
words  have  roots  which  are  undefined;  some  verbs  have  invalid  complement  types;  some  entries 
have  typing  errors  which  make  the  entry  unreadable  to  the  display  procedure.  When  one  of  these 
conditions  is  encountered,  the  display  procedures  print  out  a  generic  error  message.  The  actual 
error  can  be  pinpointed  (if  desired)  by  running  the  lexicon  integrity  checker,  which  has  been 
separately  developed  and  documented. 

Please  be  aware  that  this  version  of  the  lexical  displays  is  incomplete  and  deficient  in  several 
respects.  We  are  only  displaying  information  in  the  lexicon  which  the  Lexical  Entry  Procedure 
understands,  and  it  turns  out  that  this  is  a  subset  of  the  actual  information  in  the  pundit  lexicon. 
In  addition,  we  are  currently  unable  to  provide  the  correct  treatment  of  words  which  have  more 
than  one  root  form,  and  we  are  not  showing  ‘other  spellings’  of  words.  The  next  version  of  the 
displays  will  remedy  these  shortcomings. 

It  still  remains  possible  to  obtain  a  display  of  the  raw  physical  database,  if  you  wish.  Two  PUNDIT 
procedures  exist  which  may  be  used  for  this  purpose: 

•  8hovJ.ez. 

This  procedure  displays  all  the  lexical  entries  in  the  Prolog  recorded  database,  exactly  in  the 
form  in  which  they  are  stored. 

•  edit jBord(Vord) . 

This  procedure,  which  is  documented  separately,  can  be  used  to  edit  the  raw  lexical  entries 
in  the  Prolog  recorded  database  for  a  specified  word.  But  since  it  first  displays  all  the  lexical 
entries  which  have  the  same  root  as  the  specified  word,  it  can  be  used  eis  the  ‘physical’ 
equivalent  of  the  logical  view  offerred  by  lex jdisplayf Word). 

2  Lexical  Entry  Procedure 

The  LEP  has  been  enhanced  to  allow  the  entry  of  ‘multi-word  expressions’  such  as  anti- submarine 
rocket.  These  are  stored  in  the  lexicon  as  single  words  joined  by  circumflexes,  for  example,  as  in 
the  follwoing  lexical  entry: 

: (asroc.root: anti* (-) "submarine ‘rocket , [n: [11 .singular] , 11 : [ncountl]] ) . 

To  enter  such  expressions  using  the  LEP,  simply  enter  them  in  the  form  in  which  they  would  appear 
in  text,  e.g.: 

Word;  anti-submeirine  rocket 

The  LEP  will  transform  this  into  the  form  in  which  it  must  be  stored  in  the  lexicon. 
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1  Introduction 


This  document  describes  the  current  object  options  of  the  grammar,  with  the  corresponding  passobj 
(passive  object)  options  and  iSRs  (Intermediate  Syntactic  Representations  -  see  below),  and  with 
some  very  limited  annotations  on  their  structural  quirks,  semantics,  raison  d’etre,  and  so  forth. 
The  numbering  of  object  options  below  is  the  same  as  that  in  the  Lexical  Entry  Procedure,  and 
these  notes  are  intended  for  use  during  entry  of  new  lexical  items.  Object  options  which  are 
restricted  to  one  or  two  verbs  (such  as  BE_AUX,  VENO,  and  VO,  associated  with  the  auxiliaries 
6e,  Aaue,  and  modals)  are  not  included  in  this  list,  because  we  assume  that  most  verbs  with  these 
subcategorizations  have  already  been  entered  in  the  lexicon.  Such  object  types  may  be  assigned 
to  a  new  verb  by  choosing  Other  in  the  Lexical  Entry  Procedure  menu 

1.1  Handling  of  Passive  in  the  Lexicon 

The  parse  tree  built  by  pundit  represents  surface  structure;  transformations  such  as  passivization 
and  wh-movement  are  not  ‘undone’  at  this  level.  Thus  verbs  must  be  subcategorized  for  the  objects 
they  take  in  both  active  and  passive.  (Note  on  terminology:  objects  of  the  verb  in  its  active  form  are 
called  object;  the  list  of  a  verb’s  objects  in  the  lexicon  is  called  the  objlist.  Similarly,  passive 
objects  are  called  passobj,  and  the  list  of  a  verb’s  passive  objects  in  the  lexicon  is  called  the 
pobjlifit.  Note  the  systematic  ambiguity  of  the  word  ‘object’.)  Because  the  correlation  between 
an  active  and  a  passive  object  is  predictable,  the  Lexical  Entry  Procedure  automatically  computes 
the  passobj  on  the  basis  of  the  active  objects  selected.  Verbs  which  do  not  passivize  receive  no 
pobjlist  whatsoever  in  the  lexicon;  they  should  not  be  subcategorized  for  NULLOBJ  in  the  passive. 
The  6y-phrase,  if  present,  is  parsed  as  a  sentence  adjunct  rather  than  a  passobj.  Note  that  although 
some  active  object  options  (e.g.,  NULLOBJ )  are  never  associated  with  a  corresponding  passive 
object,  since  they  never  passivize,  others  may  or  may  not  be;  since  the  Lexical  Entry  Procedure 
automatically  computes  the  corresponding  passobj  for  any  object  type  which  passivizes,  it  is  up 
to  the  user  to  edit  out  of  the  lexical  entry  any  unacceptable  passobj. 

1.2  The  ISR 

Although  the  parse  tree  represents  surface  structure,  the  isR  is  a  somewhat  more  abstract  level  of 
syntactic  representation,  which,  like  the  ‘deep  structure’  of  transformational  grammar,  provides  a 
more  transparent  representation  of  argument  structure.  For  example,  the  surface  subject  of  the 
peissive  is  represented  in  the  iSR  as  the  object  of  the  verb.  As  in  many  current  syntactic  theories, 
the  subject  position  of  a  passive  iSR  remains  unfilled  (in  PUNDIT,  it  is  filled  with  the  durr.'my 
element  passive),  and  it  is  the  function  of  semantic  rules  to  determine  whether  an  element  in  a 
Jy-phrase  may  fill  the  semantic  role  which  would  be  assigned  to  the  subject.  Thus,  at  least  for  the 
object,  active  and  passive  sentences  can  be  interpreted  by  the  same  semantic  mapping  rules.  In 
some  cases,  the  isRs  of  passive  sentences  diverge  significantly  from  the  surface  structure  in  order  to 
bring  about  this  parallelism  between  active  and  passive;  for  example,  the  iSR  for  a  pseudopassive 
such  as  The  patient  was  operated  on  reconstructs  the  prepositional  phrase.  Thus  the  surface  parse 
tree  provides  the  bare  preposition  on  as  object  of  the  verb,  while  the  ISR  provides  the  prepositional 
phrase  on  the  patient  as  object. 

The  ISR  also  fleshes  out  the  argument  structure  of  constructions  such  as  equi  and  raising,  as  seen  in 
connection  with  object  types  EQTOVO,  Tovo,  and  OBJTOVO  below;  and  it  regularizes  the  surface 
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order  of  object  types  which  differ  from  one  another  only  in  the  order  of  their  components  (such  as 
NPN  and  PNN,  or  dp2  and  dp3). 

Because  there  are  such  divergences  between  the  ISR  and  the  surface  parse,  and  because  the  ISR 
plays  an  important  role  m  the  interface  between  syntauc  and  semantics,  the  iSRs  eissociated  with 
each  object  type  and  its  passivized  counterpart  are  given  below.  For  ease  of  exposition,  only  the 
prettyprinted  iSR  is  displayed. 


1.3  On  pvals  and  dpvals 

Object  types  containing  prepositions  can  be  subcategorized  for  particular  prepositions,  via  pval 
sublists  in  the  lexicon;  object  types  containing  particles  can  be  subcategorized  for  specific  particles 
via  dpvsLl  lists  in  the  lexicon.  The  Lexical  Entry  Procedure  queries  the  user  to  create  these  lists 
where  appropriate. 


2  Object  Options 

2.1  NULLOBJ 

A  verb  which  takes  no  complement  at  all  is  subcategorized  for  nullobj.  Example:  The  pump 
failed,  which  receives  the  following  iSR; 

OPS;  past 
VERB:  tail 
SUBJ:  the  pump  (sing) 

Such  verbs  do  not  passivize,  hence  there  is  no  corresponding  passobj. 


2.2  NSTGO 

This  is  the  simple  transitive  verb  option,  a  noun  phrase  non-predicative  direct  object.  Example: 
She  repaired  the  sac,  which  receives  the  following  ISR.  The  direct  object  receives  the  semlabel  obj. 
(Semlabels  are  applied  to  elements  in  the  ISR  to  label  those  grammatical  functions  which  play  a 
role  in  semantic  rules.  In  the  prettyprinted  ISRs,  the  csmiabels  of  all  postverbal  elements  appear 
in  capital  letters,  e.g.  SUBJ.  in  the  example  below.) 


OPS; 

past 

VERB: 

repair 

SUBJ: 

pro ;  she 

(sing) 

OBJ: 

the  sac 

(sing) 

The  passobj  counterpart  of  NSTGO  is  nullobj,  as  in  The  sac  was  repaired  (by  her).  The  by- 
phrase  is  parsed  as  a  sentence  adjunct;  this  is  nut  evident  in  the  ISR  below  because  the  iSR  (for 
reasons  having  to  do  with  the  functioning  of  the  semantic  interpreter)  fails  to  indicate  whether  a 
prepositional  phreise  occurs  as  a  sentence  adjunct  or  a  verb  object. 
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OPS :  past 

VERB:  repair 
SUBJ :  passive 
OBJ:  the  sac  (sing) 

PP:  by 

pro:  her  (sing) 

Note  that  the  surface  subjecc  is  represented  as  the  object  in  the  ISR.  The  subject  position  of  the 
ISR  is  filled  with  the  durruny  element  passive. 

2.3  PN 

This  is  a  prepositional  phrase  object.  Example;  They  operated  on  him: 

OPS :  past 

VERB:  operate 

SUBJ:  pro:  they  (pi) 

PP :  on 

pro:  him  (sing) 

Corresponding  passobj:  isolated  preposition.  Example;  He  was  operated  n;  in  the  iSR,  the  prepo¬ 
sitional  phrase  is  reconstructed; 

OPS:  past 

VERB:  operate 
SUBJ:  passive 
PP:  on 

pro:  he  (sing) 

When  do  we  want  PN  to  be  analysed  as  an  object  option  rather  than  a  sentence  adjunct  (sa)?  As 
far  as  I  can  tell,  the  following  are  the  most  relevant  cases  in  which  the  PN  object  is  subcategorized 
for  in  this  system; 

(a)  The  verb  is  unacceptable  with  NULLOBJ,  and  PN  will  suffice.  E.g.,  *He  told  (ignoring 
elliptical  reading).  But  He  told  of  great  adventures. 

(b)  The  VERB  -t-  PN  has  an  idiomatic  meaning  (or  just  feels  like  a  unit);  the  surgeon  operates 
on  the  patient  and  the  surgeon  operates  on  the  table  represent,  under  their  most  plausible 
readings,  the  PN  object  and  SA  attachments  respectively.  Similarly;  Bill  turned  into  the 
side  street  (sa  expressing  where  he  turned)  vs.  Bill  turned  into  an  orangutang  (PN 
object). 

The  possibility  of  a  pseudopassive  doesn’t  seem  to  be  a  motivating  factor;  sleep  in  our  lexicon  isn’t 
subcategorized  for  in  or  on,  etc.,  yet  you  can  say  That  bed  was  slept  in  by  George  Washington  or 
This  floor  has  been  slept  on  by  countless  fatigued  partygoers.  If  a  verb  with  PN  object  can  passivize 
at  all,  as  above,  its  passobj  will  be  a  P  (at  the  moment  this  passobj  is  not  listed  under  very 
many  verbs  in  the  lexicon.)  Thus  it  is  currently  an  unsolved  problem  how  to  treat  pseudopassives 


corresponding  to  active  sentences  in  which  the  PN  is  in  Sa  as  in  the  sle.p  example  above:  we  don’t 
really  want  to  allow  p  as  an  sa  option  generally.  So  another  possibility  would  be  to  allow  PN  object 
(with  no  subcategorization  for  specific  lexical  items)  more  freely,  automatically  generating  the  PN 
object  possibility  for  ANY  verb  which  allows  ps  udopassive.  The  cost  of  this  is  that  we  lose  the 
way  of  structurally  representing  differences  such  as  that  between,  e.g.,  operate  on  the  table  and 
operate  on  the  patient. 

2.4  NPN 

and 

2.5  PNN 

NPN  consists  of  an  NSTGO  followed  by  a  PN,  as  in  They  returned  the  disk  drive  to  the  factory. 


OPS;  past 

VERB:  return 

SUBJ :  pro;  they  (pi) 

OBJ:  the  disk'drive  (sing) 

PP ;  to 

the  factory  (sing) 

See  above  for  discussion  of  when  to  include  the  PN  in  object  rather  than  SA.  Another  criterion:  is 
there  a  corresponding  PNN  object?  PNN  is  the  BNF  node  associated  with  NPN  which  has  undergone 
a  shifting  of  the  NP,  constrained  by  various  stylistic  factors  such  as  heaviness.  It’s  one  of  the 
unpleasant  facets  of  the  grammar  we  use  that  this  extraposition  gets  expressed  as  a  different  BNF 
node.  Subcategorization  for  PNN  follows  redundantly  from  subcategorization  for  NPN,  since  the 
acceptability  of  PNN  depends  not  on  the  verb  but  on  the  NP  itself.  (Compare  ffe  presented  to  us 
an  enormous  chocolate  cake  iced  with  yellow  daffodils  vs.  the  much  less  pieetsing  He  presented  to 
us  a  cake.) 

Note  that  a  sequence  of  NP  +  PN  need  not  be  parsed  as  NPN;  for  example,  I  found  Louise  in  a 
state  of  euphoria  should  probably  be  classed  as  a  SOBJBE  (see  below),  given  related  sentences  such 
eis  I  found  Louise  euphoric,  I  found  Louise  a  changed  woman.  The  PN  here  is  predicated  of  Louise 
rather  than  simply  being  an  argument  oi  find.  In  contrast,  the  PN  in  I  found  Louise  on  the  fourth 
try  seems  more  like  an  sa  describing  the  circumstances  of  the  event  of  finding  Louise,  certainly  not 
a  predication  stating  that  Louise  was  on  the  fourth  try. 

The  passobi  counterpart  of  NPN/ PNN  is  PN.  as  in  The  disk  drive  was  returned  to  the  factory: 


OPS: 

past 

VERB: 

return 

SUBJ; 

passive 

OBJ: 

the  disk*drive  (sing; 

PP; 

to 

the  factory  (sing) 


(Compare  *The.  ^  lory  was  returned  the  di.sk  drive  to:  no  pseudopassive  is  possible  here  except 
with  idiomatic  expressions  such  as  He  was  given  a  talking  to.) 
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2.6  OBJBE 


OBJBE,  the  object  type  aissociated  with  6e  as  a  main  verb,  is  subcategorized  for  by  verbs  other 
than  bt.  objbe  expands  to  an  np,  an  adjective  phrase,  or  a  PP;  not  every  verb  allows  all  these 
expansions,  as  indicated  by  bvals  in  the  lexicon.  (The  Lexical  Entry  Procedure  does  not  currently 
solicit  bvals.)  Examples:  The  pump  appears  inoperative: 


OPS:  present 

VERB :  appear 
SUBJ:  the  pump  (sing) 
AOJ:  inoperative 


and  She  became  a  field  engineer. 

□PS :  past 

VERB:  became 

SUBJ:  pro:  she  (sing) 

PREDI:  a  field'engineer  (sing) 

These  verbs  don’t  passivize  at  all,  so  they  have  no  passobj  counterpart  (and  hence  no  pobjlist  is 
created  for  them  by  the  Lexical  Entry  Procedure.) 

Thus  an  NP  following  the  verb  can  be  analysed  either  as  an  NSTGO  (He  photographed  the  President ’s 
advisor)  or  as  an  OBJBE  (He  became  the  President’s  advisor).  This  enforces  the  well-known  fact 
that  predicative  verbs  do  not  passivize:  The  best  cars  are  made  by  the  Japanese  (active  form: 
nstgo)  vs.  *The  best  cooks  are  made  by  Italians  (active  form:  objbe). 


2.7  EQTOVO 

An  example  of  EQTOVO  is  The  fe  wants  to  repair  the  disk  drive.  EQTOVO  corresponds  to  what  is 
traditionedly  known  as  an  infinitival  complement  with  subject  controlled  equi;  the  subject  of  the 
matrix  verb  is  understood  to  be  also  the  subject  of  the  infinitive.  This  is  made  explicit  in  the  ISR, 
where  the  matrix  subject  is  copied  into  the  infinitive;  the  ID  variables  for  the  two  NPS  are  identical 
(a  fact  which  is  obscured  below  because  the  ISR  prettyprinter  does  not  display  variables): 


OPS: 

present 

VERB; 

Bant 

SUBJ: 

the  : 

field'engineer  (sing) 

OBJ: 

OPS; 

untensed 

VERB 

repair 

SUBJ 

the  field'engineer  (sing) 

OBJ; 

the  disk'drive  (eing) 

There  is  no  passobj,  as  these  structures  do  not  passivize. 
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2.8  TOVO 


An  example  of  TOVO  is  The  pump  seems  to  be  failing.  The  TOVO  object  corresponds  to  what  is 
traditionally  known  as  raising;  the  matrix  subject  is  analysed  as  an  argument  of  the  infinitive,  but 
not  of  the  matrix  verb,  which  has  the  infinitive  as  its  sole  argument.  This  is  made  explicit  in  the 
ISR,  where  the  reconstructed  infinitival  clause  is  the  subject; 

□PS:  present 

VERB:  seem 

SUBJ:  OPS:  untensed, prog 

VERB:  lail 
SUBJ:  the  pump  (sing) 

As  for  passobj,  raising  verbs  don’t  passivize,  so  there  is  no  pobjlist. 

As  noted  above,  these  two  object  types  EQTOVO  and  TOVO  differ  in  their  argument  structure,  and 
hence  in  their  selection  properties,  differences  which  are  made  explicit  in  the  ISR.  In  the  EQTOVo 
(equi)  case,  the  phonologically  null  subject  of  the  infinitive  undergoes  selection  with  the  matrix 
verb  as  well  as  with  the  verb  in  the  infinitive.  That  is,  the  fe  is  really  the  subject  of  both  want  and 
repair  in  The  fe  wants  to  repair  the  disk  drive.  One  can  run  afoul  of  selection  restrictions  between 
this  noun  and  either  verb;  The  number  12  wants  _  to  be  divisible  by  S,  and  The  cat  wants  _  to  be 
divisible  by  3  are  both  anomalous,  due  to  violations  of  selection  between  the  matrix  subject  and 
the  matrix  and  embedded  predicates,  respectively. 

For  the  bare  TOVO  case,  the  matrix  subject  is  semantically  just  the  subject  of  the  lower  verb;  that 
is,  the  matrix  verb  is  really  a  one-place  predicate  with  a  clause  as  its  argument.  (Thus  the  ISR 
subject  of  The  pump  seems  to  be  failing  is  not  the  pump  but  the  pump  to  be  failing.)  There’s  no 
selection  between  the  surface  NP  subject  (the  pump)  and  this  matrix  verb  (seem);  whatever  can 
be  subject  of  the  infinitival  verb  V  can  also  be  subject  of  seem  to  V...D.  Sager  refers  to  these  as 
aspectual  verbs.  They  include:  seem,  appear,  start,  tend,  continue,  come  (as  in  It  came  to  rotate, 
NOT  ^ls  in  /  come  to  bury  Caesar,  not  to  praise  him.  The  latter  is  a  purposive  TOVO  in  SA.) 

To  summarize;  with  EQTOVO,  the  matrix  subject  is  an  argument  of  the  matrix  verb  and  also  of  the 
verb  in  the  infinitive;  with  TOVO,  the  matrix  subject  is  an  argument  only  of  the  lower  (infinitival) 
verb.  (The  two  types  correspond  to  equi  and  raising,  respectively.) 

In  Sager’s  grammer,  these  two  categories  are  conflated.  Some  existing  lexical  entries  therefore 
require  updating,  since  this  distinction  was  introduced  after  PUNDiT’s  lexicon  was  established. 

2.9  NTOVO 

Like  OBJTOVO  (see  below),  NTOVo  is  associated  with  surface  sequences  of  the  form  ‘NP  to  vp’ 
following  the  matrix  verb;  it  corresponds  to  what  is  sometimes  called  ‘exceptional  case  marking 
(ecm)’.  An  example  of  NTOVO  is  The  factory  expects  the  fe  to  repair  the  sac: 

OPS:  present 

VERB:  expect 

SUBJ:  the  factory  (sing) 

OBJ:  OPS:  untensed 
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VERB :  repair 

SUB J :  the  lield'engineer  (sing) 

OBJ:  the  sac  (sing) 

Thus  the  field  engineeer  is  the  subject  of  the  clause  but  is  not  a  direct  object  of  the  matrix  verb; 
the  factory  does  not  expect  the  fe,  but  rather  it  expects  the  proposition  expressed  by  the  infinitive. 
(A  consequence  of  this  is  that  pleonastic  elements  such  as  there  may  occur  in  subject  position  of 
NTOVO:  I  expect  there  to  be  unlimited  champagne.) 

The  passobj  counterpart  of  NTOVO  is  Tovo,  as  in  The  fe  is  expected  to  repair  the  sac;  the  ISR  rule 
associated  with  TOVo  will  automatically  reconstruct  the  infinitive  the  fe  to  repair  the  sac: 

OPS:  present 

VERB:  expect 
SUBJ:  passive 
OBJ:  OPS:  untensed 

VERB :  repair 

SUBJ:  the  lield* engineer  (sing) 

OBJ:  the  sac  (sing) 

2.10  OBJTOVO 

OBJTOVO  corresponds  to  object  controlled  equi;  in  The  factory  told  the  fe  to  repair  the  pump,  the 
fe  is  an  argument  (indirect  object?)  of  the  matrix  verb  and  subject  of  the  infinitive; 

OPS :  past 

VERB:  tell 

SUBJ:  the  factory  (sing) 

0_0BJ:  the  f ield'engineer  (sing) 

OBJ:  OPS:  untensed 

VERB :  repair 

SUBJ:  the  field' engineer  (sing) 

OBJ:  the  pump  (sing) 

The  semlabel  d.obj  (dative  object,  formerly  known  as  inner-obj)  is  used  here  to  capture  the 
parallelism  with  The  factory  told  the  fe  the  truth. 

The  passobj  counterpart  is  eqtovo.  The  iSR  rules  associated  with  EQTOVO  reconstruct  infinitive 
as  above  for  The  fe  was  told  to  repair  the  sac: 

OPS:  past 

VERB:  tell 
SUBJ:  passive 

D_0BJ:  the  f ield'engineer  (sing) 

OBJ:  OPS:  untensed 

VERB :  repair 

SUBJ:  the  f ield'engineer  (sing) 

OBJ:  the  sac  (sing) 


Major  differences  between  NTOVO,  OBJTOVO;  in  NTOVO,  the  subject  of  the  infinitive  is  an  argument 
ONLY  of  the  lower  verb.  The  entire  infinitival  clause  is  itself  the  argument  of  the  matrix  verb. 
There  are  no  selection  restrictions  between,  e.g.,  believe  and  the  table  in  /  believed  the  table  to 
be  quite  attractive.  In  OBJTOVO,  on  the  other  hand,  the  noun  phrase  between  the  matrix  verb 
and  the  infinitive  is  an  argument  of  both  matrix  and  embedded  predicates,  as  demonstrated  by 
the  anomaly  of  I  persuaded  the  table  to  seat  6  (violates  selectional  constraints  on  persuade)  and  I 
persuaded  the  man  to  be  divisible  by  2  (violates  selectional  constraints  on  divisible),  also,  NTOVO 
but  not  OBJTOVO  allows  there  as  subject;  I  expect  there  to  be  a  diplomat  at  the  party  I  persuaded 
there  to  be  a  diplomat  at  the  party). 

PUNDIT  does  not  currently  handle  the  rare  cases  of  subject-controlled  equi  in  verb  complements  of 
the  form  ‘np  to  vp’,  as  in  Mary  promised  Louise  to  arrive  on  time.  This  form  of  control  is  largely 
restricted  to  the  single  verb  promise. 

2.11  THATS 

and 

2.12  ASSERTION 

THATS  and  ASSERTION  are  both  tensed  clauses,  with  and  without  the  complementizer  that,  as  in 
The  fe  said  that  the  disk  drive  was  inoperative: 

OPS:  past 

VERB:  say 

SUBJ;  ths  1 laid* engineer  (sing) 

OBJ:  OPS:  past 

VERB :  be 

SUBJ:  the  disk'drive  (sing) 

AOJ:  inoperative 

Verbs  subcategorized  for  THATS  and  assertion  are  automatically  subcategorized  for  these  same 
objects  in  the  pcissive,  given  the  possibility  of  pleonastic  subjects,  as  in  It  is  said  that  whales  are 
highly  intelligent.  Work  remains  to  be  done  to  constrain  these  cases  in  the  grammar.  General 
note  on  passobjs  with  verbs  taking  clausal  objects  (ASSERTION,  thats,  pnthats,  svo,  cIshould, 
SNWII,  NSNWH,  NTHATS);  in  Sager,  passives  with  it  subject  {It  was  reported  that  the  disk  failed) 
are  not  treated  as  having  a  clausal  passobj.  Rather,  the  clause  goes  into  rv  at  the  string  level. 
However,  it  seems  to  me  that  these  verbs  should  all  be  subcategorized  for  clausal  passobj. 

2.13  PNTHATS 

This  is  a  PN  followed  by  THATS,  as  in  The  fe  reported  to  the  factory  that  the  sac  had  failed: 

OPS:  past 

VERB :  report 

SUBJ:  the  f ield'enginoer  (sing) 
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PP: 


to 

the  factory  (sing) 

OBJ:  OPS:  past.perf 

VERB :  fail 
SUBJ:  the  sac  (sing) 

These  objects  are  further  subcategorized  for  pvais,  like  all  PN-containing  objects.  Not  every  VERB 
+  PP  +  CLAUSE  structure  involves  a  PNTHats;  for  example,  thts  proves  with  some  ceriainty  that 
the  world  is  round  should  be  analyzed  as  a  THATS  with  preceding  PN  in  SA,  while  this  proved  to 
everyone  that  the  theory  was  wrong  should  be  treated  as  PNTHATS  with  PN  in  OBJECT. 

The  passobj  counterparts  are  PN  and  pnthats,  as  in  It  was  revealed  to  us  yesterday  that  the 
company  had  gone  bankrupt  (PNTHATS  as  passobj),  or  That  Smith  was  the  culprit  was  announced 
to  the  entire  assembly  (PN  as  passobj). 

2.14  SVO 

svo  is  a  tenseless  clause;  it  differs  from  clSHOULD  (see  below)  in  that  (1)  svo  never  has  the 
complementizer  that,  (2)  a  pronoun  subject  of  svo  is  accusative.  Example:  She  saw  them  replace 
the  pump: 

OPS:  past 

VERB :  sas 

SUBJ:  pro:  sha  (sing) 

OBJ:  OPS:  untensed 

VERB :  replace 
SUBJ:  pro:  them  (pi) 

OBJ:  the  pump  (sing) 

Passivization  is  not  acceptable  out  of  svo,  cf.  *They  were  seen  replace  the  pump. 

2.15  ClSHOULD 

This  consists  of  the  complementizer  that  followed  by  svo,  as  in  suggested  that  it  be  replaced: 

OPS :  past 

VERB :  suggest 

SUBJ:  pro:  he  (sing) 

OBJ:  OPS:  untensed 

VERB:  replace 
SUBJ :  passive 
OBJ:  pro:  it  (sing) 

Passobj  counterparts:  ClSHOULD,  as  in  It  was  suggested  that  we  leave  early;  and  probably  nullobj. 
(My  intuitions  are  unclear  on  NULLOBJ  as  passobj  here.) 

A  pronoun  subject  of  clSHOULD  is  nominative.  The  current  BNF  rule  for  ClSHOULD  requires  that. 
but  should  be  generalized  to  account  for  I  suggest  we  leave. 
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2.16  PNTHATSVO 


This  consists  of  PN  followed  by  clSHOULD,  as  in  /  suggested  to  Bill  that  he  write  up  his  inves¬ 
tigations.  Pvals  are  elicited  by  the  Lexical  Entry  Procedure.  Passobj  counterparts  are  PN  and 
PNTHATSVO. 

2.17  SNWH 

Not  currently  implemented.  This  is  an  indirect  question,  an  embedded  clause  beginning  with  a 
wh-word.  Example:  7  know  who  borrowed  the  car,  She  wondered  whether  it  would  snow.  Passobj 
counterparts  are  SNWH  and  NULLOBJ,  as  in  It  was  finally  revealed  who  stole  the  car,  or  What  he 
was  really  up  to  that  day  was  revealed  months  later  at  the  investigation. 


2.18  NSNWH 

Not  currently  implemented.  This  is  an  NP  object  followed  by  indirect  question,  as  in  He  asked  us 
whether  it  would  snow.  Passobj  counterparts;  SNWH,  NULLOBJ. 


2.19  NTHATS 

This  is  an  NP  followed  by  a  THATS,  as  in  She  told  the  factory  that  the  sac  was  inoperative: 


OPS: 

VERB: 

SUBJ: 

D_OBJ 

OBJ; 


past 

tell 

pro:  she  (sing) 

the  factory  (sing) 
OPS:  past 

VERB:  be 

SUBJ:  the  sac  (sing) 
ADJ:  inoperative 


Note  that  the  NP  object  is  marked  as  a  dative  object  (semlabel  d_obj,  formerly  inner ^bj).  This 
is  because  of  the  parallelism  with  dative  constructions  like  He  told  the  factory  the  truth. 

Passobj  counterpart:  THATS.  The  semlabelling  of  this  construction  in  passive  is  currently  being 
refined  in  order  to  distinguish  between  cases  like  He  was  told  that  the  pump  was  inoperative,  where 
the  subject  should  be  marked  as  d_obj  ;  and  It  was  said  that  the  pump  was  defective,  where  expletive 
it  should  not  be  represented  in  the  argument  structure  at  all. 


2.20  SVEN 

This  is  a  predicative  small  clause,  as  in  He  had  the  sac  repaired  quickly: 
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OPS: 

past 

VERB: 

have 

SUBJ: 

pro: 

OBJ: 

OPS: 

VERB 

SUBJ 

OBJ: 

ADV: 

he  (sing) 

tint  eased 

repair 

passive 

the  sac  (sing) 

quickly 


This  sentence  is  ambiguous  between  sven  and  nstgo  analyses  of  the  object:  the  nstgo  reading 
can  be  paraphrased  He  had  ihe  sac  which  had  been  repaired  qutckty,  while  the  SVEN  reading  can 
be  paraphratsed  He  caused  the  sac  to  be  repaired  quickly.  In  the  latter  case,  no  one  need  be  in 
possession  of  the  sac.  This  difference  is  clearer  still  in  She  found  ihe  book  missing.  Clearly,  book 
is  not  itself  an  argument  of  find,  since  the  book  was  not  found;  what  vias  found  (out)  wm  the 
proposition  the  book  is  missing.  There’s  a  lot  of  variation  here,  though;  sometimes  the  subject  of 
the  small  clause  under  find  also  seems  to  be  an  argument  of  the  verb,  especially  in  the  passive  ( The 
car  was  found  parked  on  Elm  Street).  Other  verbs  are  clearer:  They  reported  the  car  stolen  doesn’t 
mean  that  they  reported  the  car,  nor  does  He  had  the  stairs  fixed  mean  that  he  had  the  stairs. 
Probably  one  should  split  hairs  and  use  two  different  BNF  nodes  corresponding  to  the  NTOVO  vs. 
OBJTOVO  (exceptional  case  marking  vs.  object-controlled  equi)  distinction. 

Passobj  counterpart:  VENPASS,  as  in  The  gear  teeth  were  found  stripped  and  corroded.  SVEN  doesn’t 
always  passivize,  as  above.  (ISR  rule  is  still  under  development  for  this  passobj.) 


2.21  NN 

NN  is  the  double  object  dative,  as  in  The  factory  found  her  a  new  pump  or  They  told  her  the  result: 

□PS:  past 

VERB:  tall 

SUBJ:  pro:  they  (pi) 

D.OBJ:  pro;  her  (sing) 

OBJ:  the  result  (sing) 

Note  that  the  indirect  object  is  semlabelled  d.obj. 

Passobj  counterpart  is  NSTGO,  as  in  She  was  told  the  result: 

□PS :  past 

VERB:  tell 

SUBJ:  passive 

D_OBJ:  pro;  she  (sing) 

□BJ:  the  result  (sing) 

Note  that  NP  +  NP  sequences  need  not  be  parsed  as  NN.  /  gave  Ruth  a  good  answer  contains  NN, 
but  I  consider  Ruth  a  good  dancer  is  SOBJBE  (below). 

Many  but  not  all  NNs  have  counterparts  with  the  to-  or  for-  dative;  thus  give  books  to  Louise 
alternates  with  give  Louise  books.  However,  in  some  cases  only  the  prepositional  form  is  found 
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(compare  the  meaning  of  I  got  my  degree  for  my  parents  (not  for  myself)  with  that  of  /  got  my 
parents  my  degree)',  in  other  cases,  we  find  only  NN,  as  in  The  book  cost  Mary  five  dollars.  The 
two  constructions  (nn  and  prepositional  datives)  have  different  semantic  properties,  so  we  do  not 
want  to  attempt  to  represent  them  identically  in  the  ISR. 

2.22  SOBJBE 

This  is  another  small  clause,  consisting  of  subject  followed  by  OBJBE  (nstg,  astg,  or  pn),  as  in  7 
consider  him  a  genius  or  They  consider  it  tnoperaitve: 


OPS: 

present 

VERB: 

consider 

SUBJ: 

pro: 

they  (pi) 

OBJ: 

OPS: 

untensed 

VERB: 

be 

SUBJ: 

pro:  it  (i 

ADJ: 

inoperative 

Sager  has  further  subcategorization  for  nstg  or  astg  or  pn  (or  dstg,  not  included  here)  via  bvals  in 
the  lexicon,  since  some  verbs  do  not  allow  all  OBJBE  options:  cf.  That  made  her  angry,  That  made 
her  the  reigning  monarch,  *That  made  her  in  r  state  of  rage,  pundit’s  Lexical  Entry  Procedure 
does  not  currently  elicit  bvals. 

The  passobj  counterpart  is  objbe,  as  in  He  is  considered  a  genius  by  his  associates  or  It  is  con¬ 
sidered  inoperative: 


OPS: 

present 

VERB: 

consider 

SUBJ : 

passive 

OBJ: 

OPS: 

untensed 

VERB: 

be 

SUBJ: 

pro:  it  (i 

ADJ: 

inoperative 

2.23 

NA 

This  is  a  sequence  of  NP  followed  by  an  adjective  phrase,  as  in  She  painted  the  bam  red  or  they 
stripped  the  gears  bare: 

OPS:  past 

VERB:  strip 

SUBJ:  pro;  they  (pi) 

OBJ  :  the  ge^o^  (pi) 

RES_CL:0PS:  untensed 

VERB :  be 

SUBJ:  the  (pi) 

AD J :  bare 
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The  NA  object  type  differs  from  sobjbe  in  several  respects.  First,  in  na  the  NP  is  an  argument  of 
the  verb;  if  one  paints  the  barn  red,  one  has  definitely  painted  the  barn,  whereas  to  have  found  the 
book  missing  is  not  to  have  found  the  book,  and  to  believe  the  problem  insoluble  is  not  to  believe 
the  problem.  Furthermore,  the  predication  relationship  between  the  adjective  phrase  and  the  NP 
is  interpreted  as  a  result  in  the  case  of  NA.  Finally,  there  is  sometimes  idiosyncratic  selection 
between  verb  and  adjective  in  na,  but  not  in  sobjbe.  Thus  We  sanded  it  smooth  sounds  fine,  but 
We  sanded  it  ugly  sounds  odd,  even  if  the  ugliness  is  interpreted  as  resulting  from  the  sanding. 

The  passobj  counterpart  is  astg,  as  in  The  house  was  painted  red  or  It  was  stripped  bare: 


OPS: 

past 

VERB: 

strip 

SUBJ: 

passive 

OBJ: 

pro: 

it  (sing) 

RES_CL 

:OPS: 

untansed 

VERB: 

be 

SUBJ: 

pro :  it 

ADJ: 

haza 

2.24 

ASTG 

(sing) 


Example:  It  went  bad: 

OPS:  past 

VERB:  go 

SUBJ:  pro:  it  (sing) 

ADJ :  bad 

Verbs  with  the  asto  object  select  for  particular  adjectives,  as  in  He  went  mad  (vs.  the  anomalous 
He  went  sane)]  and  do  not  subcategorize  for  other  OBJBE  options  {*He  went  a  madman).  But  it 
seems  semi-semantic;  He  turned  blue/green/mean/sour/serious  but  *He  turned  old/happy.  Thus 
it  might  not  be  possible  to  subcategorize  for  specific  lexical  items. 

No  passive. 


2.25  DSTG 

This  is  also  quite  rare.  Certain  verbs  subcategorize  for  specific  adverbs  {He  means  well  \s.  *He 
means  warmly,  or  She  did  beautifully  vs.  *She  did  quietly).  No  passive. 


2.26  DPI 

This  is  the  simplest  verb-particle  combination,  as  in  He  showed  off,  We  lined  up  (vs.  *He  showed 
out,  *We  lined  over),  or  Engine  jacks  over. 
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OPS: 

present 

VERB: 

jack 

SUBJ: 

engine  (sing) 

PTCL: 

over 

No  passive. 

2.27 

DP2 

dp2  is  a  particle  followed  by  an  NP,  as  in  He  ran  up  the  bill.  In  contrast,  He  ran  up  the  hill  in 
its  normal  interpretation  is  NOT  a  dp2,  but  is  rather  a  PN  object.  One  test:  only  particles  can 
occur  to  the  right  of  the  noun;  He  ran  the  bill  up  vs.  *He  ran  the  hill  up,  to  cite  a  classic  example. 
Another  test:  only  a  PN  can  be  topicalized,  since  it’s  a  constituent:  Up  the  HILL  he  ran  vs.  *  Up 
the  BILL  he  ran.  Another  example:  They  blew  up  the  ship: 

OPS :  past 

VERB :  bloB 

SUBJ:  pro:  they  (pi) 

PTCL:  up 

OBJ:  the  ship  (sing) 

Passobj  counterpart  is  the  particle,  OP  1,  as  in  A  huge  bill  was  run  up  that  evening  or  The  ship  was 
blown  up: 

OPS :  past 

VERB:  bloB 

SUBJ :  passive 

OBJ:  the  ship  (sing) 

PTCL:  up 

2.28  DPS 

dp3  is  just  the  permuted  version  of  dp2,  where  the  particle  follows  the  noun  phrase.  Same  passobj 
as  Dp2;  order  regularized  in  ISR.  Since  there  are  no  transformations  in  PUNDIT,  such  alternations 
as  that  between  dp2  and  dp3  must  be  handled  lexically. 

2.29  DPIPN 

This  is  a  particle  followed  by  a  PP;  She  moved  in  on  htm,  They  found  out  about  tt,  The  factory 
should  have  followed  up  on  tt: 

OPS:  past, shall, perf 

VERB:  folloB 

SUBJ:  the  factory  (sing) 

PTCL :  up 
PP:  on 

pro:  it  (sing) 
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Passobj  counterpart  is  DPIp,  when  it  passivizes,  as  in  The  announcement  was  led  up  to  by  a  aeries 
of  remarks  about  the  company’s  financial  difficulties{?),  or  It  should  have  been  followed  up  on: 

OPS:  past .shall, perl 

VERB:  I0II08 
SUBJ :  passive 
PP:  on 

pro:  it  (sing) 

PTCL:  up 

2.30  DP2PN 

DP2pn  is  a  DP2  (particle  +  NP)  followed  by  a  PN,  as  in  He  mixed  up  the  apples  with  the  pears. 

Passobj  counterpart:  dpIpn,  as  in  The  apples  were  mixed  up  with  the  pears.  (Not,  for  example, 
*The  pears  were  mixed  up  the  apples  with.) 

2.31  DP3PN 

This  is  a  DP3  (np  +  particle)  followed  by  a  PN,  as  in  mix  the  apples  up  with  the  pears.  Passobj 
counterpart  is  also  DPlPN. 

2.32  DPSN 

DPSN  is  a  particle  followed  by  a  clause,  as  in  She  found  out  where  it  was  hidden,  He  pointed  out  that 
it  was  noon  already,  They  often  make  out  to  be  villains,  or  She  found  out  that  it  was  inoperative: 


OPS: 

past 

VERB: 

find 

SUBJ: 

pro: 

she  (sing) 

OBJ; 

OPS: 

past 

VERB: 

be 

SUBJ: 

pro:  it  (1 

ADJ: 

inoperative 

PTCL: 

out 

Passobj  counterparts  are  DPSN,  as  in  It  was  pointed  out  frequently  that  the  plan  could  not  succeed, 
and  DPI  Where  it  was  hidden  was  never  really  found  out.  Both  sound  a  little  marginal,  but  might 
occur. 
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1.  Introduetion 

This  guide  is  intended  to  introduce  the  PUNDIT  community  to  the  selectional  component,  and 
to  answer  any  questions  that  users  may  have  about  its  use  and  operation.  Improvements  and 
suggestions  are  most  welcome. 

2.  Rationale 

The  purpose  of  this  module  is  to  collect  empirically  observed  word-level  selectional  patterns 
from  data,  and  to  support  generalisation  of  these  patterns  to  semantic  class  patterns.  These 
patterns  are  classified  into  valid  and  invalid  patterns,  and  stored  in  a  pattern  database. 

S.  Baalea 

The  selectional  component  is  invoked  from  the  BNF  grammar  by  two  restrictions: 
vso.aelechion  and  np.selection. 

•  vso.selection  is  called  from  the  CENTER  and  ASSEKT_FRAG  nodes,  to  check  selection  in 
assertions,  questions,  and  fragments. 

•  np. selection  is  called  from  the  NSTG  node,  to  check  selection  in  LNR  nodes. 

The  BNF  rules  in  question  are  the  following: 

: -  center  : : ■ 

( ( ( {dquest2} ,  assertion,  {w.endmark}  ->  assex tion) , {vso.selection} }  xor 
(({dquestU,  question,  {w.endmarkl  ->  question),  {vao.seleotion} )  xor 
((fragment,  {w.endmarkl  ->  fragment),  (vso.selection) ) )  xor 
(compound  ->  compound),  {vso.selection}. 

assert. frag 

((assertion,  internal.punct  ->  assertion)  xor 
(fragment,  internal.punct  ->  fragment)),  {vso.selection}. 

; -  nstg  : : - 

( {d.endmark} , 

(((Inr  ->  Inr ) , {np.selection} ) ; 

(Ipror  ->  Ipror) ; 

(nsvingo  ->  nsvingo))); 

(  {d.gap} ,nullwh  ->  nullwh) . 

^Tbi«  work  has  been  lupported  ia  part  by  DARPA  uoder  contract  N00014-8&-C-0012,  adminiitered  by  the  Office 
of  Nava]  Reaearch  (approved  por  pubuc  rbleasb,  DisTRisunoN  unumitbd),  is  part  by  National  Science  Foundation  con¬ 
tract  DCR-8S-0220b,  a>  well  at  by  Independent  R&D  funding  from  Syitem  Development  Corporation,  now  part  of  Un¬ 
ity!  Corporation. 


These  two  restrictions  then  examine  the  ISR  for  those  nodes  to  check  selectional  patterns  in  the 
assertion,  question,  fragment,  or  noun  phrase.  At  the  time  these  restrictions  are  called,  the  ISRs 
are  expected  to  be  instantiated,  simplified,  fully  assembled  and  lambda-free.  If  the  restrictions 
encounter  an  ISR  which  is  not  in  simplified  operator-operator  form,  a  very  visible  warning  mes¬ 
sage  will  be  issued  to  the  user  by  the  “soop  checker”.  Assuming  the  ISR  is  well  formed,  each  of 
the  two  restrictions  then  calls  a  definite-clause  grammar  (DCG)  to  analyze  the  ISR. 

4.  At  the  Top  Level 

After  typing  in  a  sentence,  the  user  will  be  asked  to  enter  a  unique  sentence  ID  if  that  sentence 
has  not  yet  been  recorded  in  the  current  corpus  of  sentences. 

The  parser  will  then  parse  away,  and  when  a  complete  LNR  or  sentential  node  has  been  assem¬ 
bled,  the  ISR  for  that  node  will  be  passed  to  the  DCG,  and  the  questions  will  begin. 

B.  The  Queries 

In  the  course  of  examining  the  ISR,  the  selection  mechanism  will  ask  certain  questions  about  the 
validity  of  lexical  co-occurrence  patterns.  Some  typical  questions  (with  some  sample  answers  in 
italics)  are 


Is  this  •<8vo>  pattern  good:  field'^engineer  repair  sac - >  y 

Is  this  <qpos/n>  pattern  good:  <NUMBER>  sac  - >  y 

Is  this  <n/pp>  pattern  good:  loss  of  sac - -  y 

Is  this  <n/n>  pattern  good:  sac  failure - -  y 

Is  this  <adj/n>  pattern  good:  fine  particle - >  y 


The  question  contains  two  important  parts: 

•  the  type  of  pattern  (e.g.,  svo,  qpos/n,  n/pp,  n/n,  adj/n) 

•  the  specific  lexical  items  which  form  that  pattern  (In  certain  cases,  special  atoms  such  as 
<NUMBER>  will  appear  in  the  pattern  instead  of  actual  lexical  items.  These  special 
atoms  are  discussed  in  more  detail  below). 

6.  The  Pattern  Types 

The  types  of  patterns  are  listed  in  Figure  1  below  (this  list  is  subject  to  change),  with  examples 

for  any  non-obvious  patterns.  The  names  of  the  patterns  will  eventually  change,  since  currently 

the  slash  (“/”)  is  overloaded,  denoting  conjunction,  modification,  and  siblinghood. 

7.  The  Responses 

When  prompted  with  such  a  lexical  pattern,  the  user  has  several  possible  responses: 

(1)  “y”:  (YES)  Signals  a  globally  good  pattern.  Answer  with  y  when  the  pattern  is  semanti¬ 

cally  acceptable,  consistent  with  the  domain,  and  plays  the  intended  role  in  the  sentence 
(i.e.,  leads  to  a  correct  parse). 
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p 

Figure  1:  Types  of  Selectional  Patterns 

P  PATTERN 

EXPLAj’''ATION  and  EXAMPLES  of  GOOD  PATTERNS 

H  arlj/n 

An  adjective  (either  attributive  or  predicate)  modifying  a  noun 

EX:  FINE  metal  PARTICLES  found  in  filter. 

1 

A  PP  functioning  aa  adjective  complement 

EX:  Oil  i*  DARK  IN  APPEARANCE. 

1  adv/adj 

An  adverb  modifying  an  adjective 

EX.  Sae  w  COMPLETELY  INOPERATIVE. 

1  adv/p 

An  adverb  modifying  a  preposition 

P:X:  PretMure  ig  SIGNIFICANTLY  OVER  the  limit. 

M  conj/adj 

Conjoined  adjectives 

EX:  Lass  of  presMure  wag  SUDDEN  and  UNEXPECTED. 

co.ij/n 

Conjoined  nouns 

EX:  Logg  of  PRESSURE  and  TEMPERATURE. 

H  conj/v* 

Conjoined  maiii  verbs  of  2  sentences 

Ex:  The  gae  BROKE,  and  the  fe  REPAIRED  it. 

n/adj 

A  noun  modifying  an  adjective 

EXS:  FACTORY  INSTALLED,  CRYSTAL  CLEAR 

n/adv 

A  ■:  adverb  in  an  NSTGJFRAG 

1  Sae  FAILURE  YESTERDAY. 

n/n 

A  compound  noun 

EX;  Logg  of  OIL  PRESSURE. 

L  n/PP 

A  PP  modifying  a  noun 

fiX:  EROSION  OF  IMPELLOR  ig  evident 

■  n/predn 

A  subject  and  predicate  noun 

EX:  Alarm  CAPABILITY  ig  a  NECESSI'i  Y. 

L 

An  NQ  consists  of  a  noun  followed  by  a  Q; 

EX:  See  FIGURE  S. 

f  ~,/r. 

■An  NQ  modifying  a  noun 

EX:  the  FIGURE  S  STATISTICS 

1 

A  QN  consists  of  a  Q  followed  by  a  noun. 

EXS:  10  DAY,  g  INCH 

1 

s 

qn/n 


A  QN  modifying  a  noun 

EXS:  a  FIVE  ALARM  FIRE,  a  SOO  PAGE  BOOK 


qpos/n  A  quantifier  modifying  a  noun 
EX:  The  fe  repaired  5  SACS. 

svo  Subject  verb  object 

EX:  The  FE  REPAIRED  the  SAC. 

v/adv  An  adverb  modifying  a  verb 

EX:  The  eae  FAILED  SUDDENLY. 

v/pp  A  PP  modifying  a  verb 

EX:  Metal  partielee  were  DISCOVERED  IN  OIL  FILTER. 

v/qn  An  NQ  modifying  a  verb 

No  known  examples  in  the  sac  domain 
A  muck  example  la  Ship  CLEARING  80  DEGREES 

^Patterns  marked  with  an  asterisk  are  not  currently  presented  to  the  user 
because  they  contain  no  significant  selectional  information. 


(2)  “n”:  (NO)  Signals  a  globally  bad  pattern.  Answer  with  n  when  the  pattern  is  semanti¬ 
cally  unacceptable,  or  not  consistent  with  the  domain. 

(3)  “s”:  (SUCCEED)  Signals  a  locally  good  pattern.  Answer  with  s  in  either  of  two  situations. 
One  is  when  the  pattern  is  semantically  unacceptable,  and  not  ci  n  ujtent  with  the  domain, 
but  happens  to  be  part  of  the  right  parse  for  the  sentence.  The:  are  rare,  and  are  usu¬ 
ally  caused  by  “phrasal  attri’.mtes”  (the  “stiff  neck  phenomenon”).  A  discussion  of  this 
troublesome  phenomenon  is  included  in  the  next  section.  A  example  from  the  sac  domain 
is  found  in  the  sentence  Oil  pressure  dropped  below  SO  psig,  which  generates  the  PP  pat¬ 
tern  {drop,  below,  psig],  which  is  anomalous,  but,  at  least  in  this  parse,  ^rect.  The 
other  case  in  which  one  should  answer  s  is  when  one  wishes  to  defer  judgement  about  a 
pattern  whose  validity  or  acceptability  in  the  domain  is  in  doubt.  If  the  user  is  not  willing 
to  categorically  state  that  the  pattern  is  anomalous,  but,  on  the  other  hand,  is  not  con¬ 
vinced  of  its  validi.y,  s  is  the  correct  response. 

(4)  ‘T’:  (FAIL)  Signals  a  locally  bad  pattern.  Answer  with  /  when  the  pattern  may  be 
semantic? Uy  acceptable  and  consistent  with  the  domain,  but  happens  to  be  part  of  a 
wrong  parse.  Example:  The  sentence  Loss  oj  oil  pressure  might  generate  the  pattern 
[loss,  of,  oil],  which  may  be  semantically  valid,  but  is  not  part  of  the  right  parse. 

(5)  “a”:  (ABORT)  Abort  parsing  this  sentence.  More  on  this  option  later. 

(6)  “b”:  (BREAK)  Enter  a  break  level.  Has  effect  of  typing  b  to  the  debugger  or  calling  the 
goal  break  in  Prolog  (which  is,  of  course  exactly  what  this  does). 

(7)  “e”:  (EXPLAIN /EXP  AND /EXAMPLES)  Ask  for  an  explanation  of  the  pattern  and  addi¬ 
tional  examples  of  such  patterns.  (This  feature  has  not  yet  been  fully  implemented.) 
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(8)  “h”  (HELP)  Ask  for  a  summary  of  possible  auswers. 


(9)  Same  as  k 

8.  The  Phrasal  Attribute  Problem 

As  mentioned  above,  “Locally  good”  patterns  are  used  to  deal  with  phrasal  attributes.  An 
example  of  this  phenomenon  taken  from  a  medical  domain  is  the  noun  phrase  stiff  neeh  The 
semantic  class  of  the  head  noun  of  this  NP,  neck,  is  something  like  BODY-PART,  but  the 
semantic  class  of  the  full  NP  stiff  n^ek  is  not  BODY-PART,  but  rather  SYMPTOM  or  AIL¬ 
MENT.  This  discrepancy  between  the  semantic  classes  of  the  full  NP  and  of  its  head  noun 
presents  a  difficulty  in  making  a  decision  about  the  acceptability  of  patterns  generated.  For 
example,  in  parsing  the  sentence  Patient  has  stiff  neck,  the  system  would  present  to  the  user 
the  SVO  pattern  [paitent,  have,  neck].  Note  that  this  is  indeed  the  correct  syntactic  parse  (in 
fact,  probably  the  oniy  one),  but  we  do  not  want  to  assert  for  posterity  that  the  SVO  pattern 
[patient,  have,  neck]  is  semantically  acceptable  in  a  medical  sublanguage. 

This  is  perhaps  a  subtle  point,  but  not  everything  that  is  true  in  a  sublanguage  can  be  said  in 
that  sublanguage.  The  sentence  Patient  has  stiff  neeh  is  a  case  in  point:  Although  it  is  cer¬ 
tainly  true  that  the  patient  has  a  neck,  nobody  would  ever  (bother  to)  say  so  because  the  propo¬ 
sition  is  completely  uninformative.  Indeed,  it  is  one  of  the  characteristics  of  a  sublanguage  that 
certain  (true)  information  is  presupposed,  and  never  explicitly  stated. 

In  short,  the  parse  is  good,  but  the  pattern  [patient,  have,  neck]  is  bad.  We  do  not  want  to 
say  the  pattern  is  good,  but  saying  it  is  bad  wUl  fail  the  parse,  and  that  is  not  a  desirable  result 
either.  Hence  the  appropriate  response  to  the  query  about  this  pattern  would  be  to  tag  it  as 
“locally  good”,  which  is  a  sort  of  comprombe  implemented  in  order  to  allow  the  parse  to 
succeed,  but  without  entering  the  pattern  in  question  into  the  (global)  pattern  database. 

Our  method  of  dealing  with  thb  phenomenon  b  admittedly  not  satbfactory.  However,  pending 
a  fuller  semantic  treatment  of  NPs  which  allows  such  dbtinctions  to  be  made,  it  at  least  per¬ 
mits  the  correct  parse  to  be  obtained  without  creating  obviously  bad  patterns. 

For  an  example  of  the  phrasal-attribute  phenomenon  from  the  SAC  domain,  consider  the  sen¬ 
tence  Start  air  pressure  dropped  beloto  SO  psig,  which  generates  the  PP  pattern  drop  helots 
psig.  The  probiematic  NP  here  b  SO  psig:  the  semantic  class  of  the  head  noun  psig  b  UNIT- 
OF-MEASUREMENT,  yet  we  would  not  say  that  the  full  NP  SO  psig  b  a  UNIT-OF- 
MEASUREMEN1  ■  SO  psig  b  instead  an  entity  of  the  class  LEVEL  or  perhaps  THRESHOLD. 
The  problem  b  that  in  evaluating  the  pattern  drop  helots  psig,  we  would  realise  that  pressure 
can  drop  below  a  certain  level  or  below  a  certain  threshold,  but  it  cannot  drop  below  a  unit  of 
measurement.  The  solution  b  to  tag  thb  pattern  as  locally  good. 

9.  Special  Atoms  Appearing  in  Patterns 

There  are  a  number  of  special  atoms  which  can  appear  in  a  pattern.  Using  such  an  atom  in  a 
pattern  usually  serves  one  of  two  purposes: 

•  To  generalise  a  pattern  immediately.  For  example,  the  qpos/n  pattern  [S,  sae]  should 
have  the  same  selection  as  [8,  sae],  so  we  generalise  both  these  patterns  to 
[<.NUMBER>,  sae]  on  the  fly.  The  generalisation  applies  to  numbers,  dates,  times,  and 
other  entities  whose  specific  value  or  instantiation  b  irrelevant  for  selectional  purposes. 
All  that  b  relevant  for  such  an  entity  b  simply  that  it  b  in  fact,  e.g.,  a  date.  Typically, 
these  special  atoms  are  productive  forms  recognised  by  PUNDlT’s  shapes  component. 


•  To  serve  as  a  placeholder  for  an  entity  whose  internal  structure  is  irrelevant  to  selection 
(e.g.,  <CLAUSE>),  or  whose  referent  is  not  inferrable  from  the  ISR  (e.g., 
<SOME.BODY/THING>,  <WH>). 

The  special  atoms  are  the  following: 

(1)  <NUMBER>:  stands  for  a  number. 

Ex:  5  tact  failed  will  generate  the  qpos/n  pattern  [<NUMBER>,  $ae\. 

(2)  <TIME>:  stands  for  a  time. 

Ex:  the  ft  repaired  the  eae  at  1150T  will  generate  the  v/pp  pattern  [repair,  at, 
<  TIME>\, 

(3)  <DATE>:  stands  for  a  date. 

Ex:  The  fe  repaired  the  eae  on  l£/S5-tSS9  ■wUl  generate  the  v/pp  pattern  [repair,  on, 
<DATB>]. 

(4)  <PART_NO>;  stands  for  a  part  number. 

Ex:  The  fe  repaired  1S3-4S8  will  generate  the  svo  pattern  \field‘engineer,  repair, 

<PART_NO>]. 

(5)  <CLAUSE>:  stands  for  a  clause. 

Ex:  The  fe  reported  that  the  eae  failed  will  generate  the  svo  pattern  \field‘ engineer, 
report,  <,CLAUSB>]. 

(6)  <SOMEBODY/THING>:  stands  for  a  passive  or  elided  constituent. 

Exs:  The  sentences  Repaired  the  eae  and  The  eae  teas  repaired  will  both  generate  the 
svo  pattern  [<.SOMEBODY/THING>,  repair,  sac]. 

(7)  <NULL>:  stands  for  a  null  object. 

Ex:  Sae  failed  will  generate  the  svo  pattern  [soe,  fail,  <NULL>]  (Do  not  confuse  this 
with  nnlln  below). 

(8)  <WH>:  stands  for  a  wh-word. 

Ex:  Who  repaired  the  eaef  will  generate  the  svo  pattern  [<  WH>,  repair,  sac]. 

(9)  nulln:  stands  for  a  null  head  noun. 

Ex:  £  broke  (as  in  The  fe  inetalled  4  sacs,  and  £  broke)  will  generate  the  qpos/n  pattern 
[<NUMBER>,  nulln]  and  the  svo  pattern  [nulln,  break,  <NULL>].  (This  last  pattern 
is  hardly  perspicuous,  and  will  need  treatment  by  some  mechanism  designed  to  handle 
referential  information.) 

The  use  of  some  of  these  special  atoms  (specifically,  <SOMEBODY/THING>,  <NULL>, 
<WH>,  and  nulln)  is  not  always  intuitive,  and  is  likely  to  change  in  the  near  future. 

10.  Generalisation  to  Claas-Level  Patterns 

After  answering  the  word-level  query  with  either  p  or  n,  the  user  will  then  be  asked  to  form  a 
generalisation  of  that  pattern  based  on  the  information  in  the  domain  isa  hierarchy,  provided, 
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of  course,  that  there  is  a  hierarchy. ‘ 


The  interface  for  this  section  is  still  in  flux,  but  the  current  state  of  aETairs  is  as  follows:  After 
answering  that  a  given  pattern  is  good  (or  bad),  the  user  is  shown  all  possible  generalizations  for 
each  word  in  the  pattern  appearing  in  the  domain  hierarchy.  For  example,  when  generalizing 
the  svo  pattern  [mtUer,  aight,  iynda],  the  user  would  be  shown  the  output  in  Figure  2  below. 


Each  line  of  the  display  in  Figure  2  shows  a  path  from  the  concept  in  question  up  to  the  chil¬ 
dren  of  the  root  concept.  The  user  is  then  asked  to  choose  which,  if  any,  of  the  concepts  are 
correct  generalisations  of  the  lexical  item. 


Figure  2:  Generalizations  Based  on  the  Domain  Hierarchy 
The  SVO  pattern  is  [fni'/Zcr,  sight,  kyndc^ 

GEMERALIZIMG  MILLER 
These  are  the  possible  generalizations  for  MILLER: 
miller  knox  us. platform  platform  physical.ob ject 
miller  knox  us.platform  platform  platform.group  physicnl.cb j£=t 
miller  knox  frigate  ship  surface. platform  platform  physical. object 
miller  knox  frigate  ship  surf ace.platform  platform  platform.group  physical. object 
•••••••••*•••••••••••••••••••  GENERALIZING  KYNDA 

These  are  the  possible  generalizations  for  KYNDA: 

kynda  ur. platform  platform  physical. object 

kynda  ur. platform  platform  platform.group  physical. object 

kynda  cruiser  ship  surface.platform  platform  physical. object 


Please  enter  the  generalizations  for  MILLER  (or  type  "thelp."  for  help). 
Generalizations:  >> 


‘Tb*  Mlaction  mecbaDiim  expacta  tbe  biararcby  to  bo  encoded  in  clauiet  of  tbe  form  lBa(8ub,Sup«r)  and 
■•aantle.typ* ( Sub , Supar ) . 
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The  intention  of  the  generalizations  is  that,  given  the  good  (bad)  pattern  P,  which  contains  the 
word  SUB,  SUB  generalizes  to  SUPER  in  the  pattern  P  iff  for  every  concept  C  such  that  C 
isa*  SUPER  (where  isa*  is  the  transitive  closure  of  isa),  the  pattern  Q,  which  is  P  with  C  sub¬ 
stituted  for  SUB,  is  also  a  good  (bad)  pattern. 

The  help  message  (printed  in  response  to  typing  “help.”  to  the  prompt 
Generalizations:  >> 

in  Figure  2)  is  quite  informative,  and  makes  available  a  number  of  useful  options.  The  help 
message  reads  as  follows: 

Type  your  choices  separated  by  commas,  and  terminated  with  a  period. 

List  format  is  not  necessary. 

Type  ”[]"  if  you  do  not  want  to  generalize  at  all. 

Type  "Ibreak"  to  enter  a  break  level. 

Type  "i abort"  to  abort  parsing  this  sentence. 

Type  "Isubs"  to  see  the  sub  concepts  of  a  concept. 

Type  "I  supers"  to  see  the  super  concepts  of  a  concept. 

Type  "I help"  to  generate  this  message. 

By  invoking  the  commands  described  in  the  help  message,  the  user  can 

•  enter  a  break  level  (just  like  at  the  word-level  prompt) 

•  abort  out  of  parsing  (more  on  this  option  below), 

•  ask  to  see  all  the  immediate  sub  concepts  of  a  given  concept 

•  ask  to  see  all  the  immediate  super  concepts  of  a  given  concept 

•  generate  the  help  message. 

After  invoking  either  the  I  subs  or  {supers  options,  the  user  is  prompted  for  the  name  of  the 
concept  whose  descendants  or  ancestors  are  to  be  displayed. 

11.  Files 

There  are  two  files  which  the  selection  module  can  write  to  (or  create): 
SELECTXONAL_FATTERNS.pl  and  USER_CORPUS.pl.  Both  files  are  in  the  current  working 
directory.  Since  selection  expects  to  be  able  to  write  to  those  files  in  the  current  working  direc¬ 
tory,  users  should  ensure  that  they  have  write  permission  to  the  current  working  directory  in 
order  to  run  with  selection  on  and  save  output  to  a  file.^ 

The  SELECTIONAL_FATTERNS.pl  file  is  used  to  store  the  patterns  that  the  user  has  been 
queried  about.  The  file  contains  lines  of  the  form 

; -record_pattern ( of , 


*It  it  poiiibte  to  cbsQge  the  curreat  working  directory  (while  running  under  Emnct)  by  typing  Ete-s  ci  followed 
by  the  deiired  directory.  The  current  directory  csn  alio  be  changed  directly  through  Prolog  by  executing  the  goal 

I  r-  unlx(cd(OIR)). 

where  DIR  in  the  goal  ii  an  atom  '  -reiponding  to  a  valid  directory.  Note-  Changing  the  current  working  directory 
via  Prolog  does  not  allow  ipecifying  a  path  beginning  with  (tilde),  but  the  Bte-M  ed  method  does. 
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bad_>eleotional_patt:arn(n/pp, [ loaa ,o£ , second ] ,uaar ( 3 ) ) ) . 

: -reoord.pattern ( and , 

good.eelectlonal.pattexnC con j/n, [nac, and, disk] ,user{f4) ) ) . 

:  -record_patt:ern(  f ail , 

good. eelectional .pattern ( svo, [sac, fail , ' <NULL> '],user(f4))). 

Once  such  a  file  has  been  created,  one  need  only  compile  it,  and  the  patterns  will  be  loaded  in. 
The  nSER.CORFUS.pl  file  is  used  to  store  the  sentences  that  the  user  has  parsed  with  selec¬ 
tion  on.  The  file  contains  lines  of  the  form 

:  -records  ( casreps ,  ld(  [  s2]  ,  [the,  sac  .failed,  ‘'.']),.660). 

: -records (casreps , id( [s4 ] , (the , sac,and, the, disk, failed, .393). 

The  sentences  stored  in  this  form  are  used  for  parsing  sentences  in  batchmode  and  for  the 
test.pundit  procedure. 

The  following  are  the  other  selection-related  files  in  the  stable  system,  and  the  contents  of  each 
file: 

(1)  selectlon.dcg.pl:  The  DCG  to  parse  the  ISR. 

(2)  selection.cpiery .pi:  The  query  and  generalization  mechanism. 

(3)  Belection_restr.pl:  The  two  selection  restrictions  (vso.selection  and 
Inr.  selection). 

(4)  8election_tool8.pl:  The  selection  switches,  and  facilities  for  inspecting,  deleting,  and 
editing  patterns. 

(5)  8alection_top_level.pl:  The  interface  between  selection  and  the  PUNDIT  top  level, 
and  various  predicates  to  inspect  and  erase  sentences  recorded  in  a  corpus. 

(6)  8electlon_utilities.pl:  MisceUaneous  utility  predicates  used  by  the  selectional  com¬ 
ponent. 

(7)  xxx_selectlon_db.pl:  Domain-specific  files  (xxx  denotes  the  domain)  containing  the 
selectional  patterns  originally  stored  in  the  SELECTIONAL_FATTERMS.pl  file.  Creating 
the  xxx_selection_db.pl  file  must  be  done  manually  by  gathering  all  selectional  pat¬ 
terns  collected  in  the  SELECTI0MAL_PATTERN5.pl  file,  verifying  their  correctness,  and 
then  putting  the  resulting  set  of  patterns  in  the  xxx_selection_db.pl  file  in  the 
appropriate  PUNDIT  directory. 

12.  Selection  Switches 

There  are  several  switches  which  can  be  used  to  control  the  behavior  of  the  selection  com¬ 
ponent.  These  switches  have  not  yet  been  incorporated  into  the  top-level  pundit  switches 
mechanism  (as  they  should  be),  so  the  way  to  use  and  control  these  switches  is  likely  to  change. 
However,  this  is  how  they  currently  work. 

To  check  the  current  setting  of  the  selection  switches,  type  the  goal  sswitches.  Every  switch 
has  a  default  setting,  indicated  below  by  “(*)”,  and  one  or  more  associated  predicates  to  control 
the  bitting  of  the  switch.  The  switches  currently  supported  in  the  selection  mechanism  are: 


(1)  unknownL-Seleetlon:  Controls  the  action  of  the  program  upon  encountering  an  unknown 
selectional  pattern.  The  possible  settings  are 
•  query:  unknown  patterns  generate  query  to  user  (*) 


•  succeed:  unknown  patterns  automatically  succeed 

•  fail:  unknown  patterns  automatically  fail 

To  enable  querying  of  unknown  patterns,  type  the  goal  Bquery. 

To  allow  unknown  patterns  to  succeed,  type  the  goal  ■succaad. 

To  force  unknown  patterns  to  fail,  type  the  goal  sfail. 

(2)  filelO:  Controls  whether  or  not  the  selectional  patterns  generated  are  written  to  the  file 
SEI4ECTlONAlj_PATTERNS.pl  in  addition  to  recording  them  in  the  recorded  DB.  The 
possible  settings  are: 

•  ON  :  patterns  are  output  to  file  (*) 

•  OFF  :  patterns  are  not  output  to  file 

To  turn  on  fllelO,  type  the  goal  fllelO(on). 

To  turn  off  fllelO,  type  the  goal  filaIO(o£f ). 

Turning  filelO  off  will  cause  a  dramatic  increase  in  the  real-time  (but  not  the  cpu  time) 
eflSciency  of  selection,  but  then  the  patterns  won’t  be  saved  to  a  file. 

(3)  pattern^trnee:  Controls  the  printing  of  trace  messages  detailing  selectional  patterns 
generated  and  found  in  the  pattern  database.  Possible  settings  are: 

•  ON  :  tracing  messages  are  printed  for  every  pattern  generated,  showing  the  lexical  pat¬ 
tern  found  in  the  ISR,  the  class  pattern  generated  from  the  lexical  pattern,  and  any  good 
or  bad  pattern  found  in  the  database  which  match  either  the  lexical  or  class  pattern  gen¬ 
erated. 

•  OFF  :  no  tracing  messages  are  printed  (*) 

To  enable  the  pattern  trace,  type  the  goal  pattern_trace(on}. 

To  disable  the  pattern  trace,  type  the  goal  pattern.trace  ( of  f ) . 

(4)  Iiir_tr«ce:  Controls  the  printing  of  trace  messages  showing  the  ISRs  of  LNRs  being  fed  to 
the  DCG.  Possible  settings  are: 

•  ON  :  the  ISRs  are  printed 

•  OFF:  the  ISRs  are  not  printed  (*) 

To  enable  the  LNR  trace,  type  the  goal  isr_tt ''ce(  Inr.oa). 

To  disable  the  LMt  trace,  type  the  goal  isr.trace  ( Inr ,  of  f ) . 

(5)  8Vo_traee:  Controls  the  printing  of  trace  messages  showing  the  ISRs  of  SVOs  being  fed  to 
the  DCG.  Possible  settings  are: 

•  ON  :  the  ISRs  are  printed 

•  OFF:  the  ISRs  are  not  printed  (*) 

To  enable  the  svo  trace,  type  the  goal  Isr.tracel Bvo,on). 

To  disable  the  SVO  trace,  type  the  goal  iar .trace (  svo, off ). 

To  enable  both  LNR  and  SVO  traces,  type  the  goal  Isr.traceCon). 

To  disable  both  LNR  and  SVO  traces,  type  the  goal  Isr .trace  ( of  f } . 

The  three  tracing  switches  (3),  (4)  and  (5)  are  used  to  help  debug  the  selection  mechanism,  and 
are  probably  of  little  interest  to  anyone  else,  as  far  as  I  can  imagine,  so  most  people  will  prob¬ 
ably  want  to  leave  them  set  at  their  defaultsi 
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IS.  Inapeetingy  Deleting,  and  Editing  Seleetional  Patterna 


After  parsing  merrily  along  for  a  while,  the  user  might  want  to  see  what  seleetional  patterns 
have  been  recorded,  and  possibly  to  delete  or  change  some  incorrect  ones.  A  large  number  of 
predicates  have  been  provided  for  inspecting,  deleting  and  editing  seleetional  patterns.  All  these 
predicates  have  to  call  a  massive  tetof,  so  if  there  are  a  great  many  seleetional  patterns 
recorded,  they  can  take  a  while. 


IS.l.  Inspecting  Patterns 

To  see  all  the  seleetional  patterns  currently  recorded,  type  the  goal  check.salecbion. 

To  see  all  the  seleetional  patterns  currently  recorded  which  contain  the  word  W,  type  the  go. 
check.select ion (word ,w ) . 

To  see  all  the  seleetional  patterns  currently  >-ecorded  which  were  generated  by  sentence  S,  type 
the  goal  checlc_8eleotlon(sent,S). 

To  see  all  the  seleetional  patterns  currently  recorded  of  a  given  type  T  (e.g.,  svo,  n/adj,  v/pp, 
etc.),  type  the  goal  checlc_aelection.( exact_type,T). 

If  one  is  unsure  of  the  exact  type  of  the  pattern  one  is  looking  for,  all  is  not  lost.  The  goal 
check_aelection(general_type,T)  will  show  all  the  seleetional  patterns  currently 
recorded  which  contain  T  as  one  of  its  components  (i.e.,  one  of  the  constituents  on  either  side  of 
the  “/”  in  the  name  of  the  pattern).  For  example,  if  one  wants  to  see  a  pattern  including  a  qn, 
but  one  is  not  sure  if  the  specific  pattern  is,  say,  a  n/qn  or  a  qn/n,  typing  the  goal 
check_8aleckion(general.type,<in)  wiU  show  all  patterns  of  any  type  which  includes  a 
qa. 

There  is  at  present  no  mechanism  for  examining  ail  seleetional  patterns  containing  a  word  of  a 
given  semantic  class. 


1S.2.  Deleting  Patterns 

There  are  variations  of  all  of  the  predicates  described  above  which  can  be  used  to  delete  selec- 
tional  patterns.  Instead  of  typing  check_selection  (with  either  0  or  2  args)  one  should  type 
erase.selection  (with  either  0  r  2  args). 

erase.selection  (0  arguments)  will  erase  all  seleetional  patterns.  Period.  There  is  no 
prompting,  and  no  confirmation,  so  be  careful!  Note  that  this  can  also  be  done  using 
rdb_remove. 

However,  using  instead  one  of  the  following  goals 

era8e_8election(word,w) . 
erase_8elect:ion(exact_type,T) . 

•rase.selection (general .type, T) . 
erase_selectlon( sent , s) • 

will  present  all  the  relevant  patterns  (e.g.,  ail  patterns  containing  the  word  W),  and  ask  which 
ones,  if  any,  to  delete.  As  is  the  case  with  examining  patterns,  there  is  at  present  no  mechanism 


IX 


for  deleting  all  eelectional  patterns  containing  a  word  of  a  given  semantic  class. 

For  example,  if  one  wanted  to  delete  patterns  containing  the  word  break,  one  wou 
erase.aelectionCword, break)  .  A  possible  result  would  be 

These  are  the  patterns  containing  "break"  which  are  currently  stox 


1 : 

(BAD) 

<v/pp> 

[break,of f , attack] 

2: 

(BAD) 

<v/pp> 

[break, of f , situation] 

3: 

{ GOOD ) 

<svo> 

[  <SOMEBODY/THIKG> ,  break ,  engagemer. 

4: 

(GOOD) 

<svo> 

[ <SOMEBODy/THIKG> , break, process ] 

Enter 

your  choices 

to  delete 

("h."  »  help)  ■■> 

The  patterns  are  be  numbered  for  reference.  The  prompt  is  explicit  about  what  to  ar.s. 
in  case  of  doubt,  typing  “h.”  as  an  answer  will  generate  the  following  even-more-exp’ 
message; 

Please  enter  one  of  the  following: 

--  the  numbers  of  the  patterns  you  want  to  delate  (a.g.,  "1,  2,  23") 

--  "all"  to  delete  all  patterns 
--  "none"  to  delete  none 
followed  by  a  period. 

In  order  to  delete  some,  but  not  all  the  patterns,  one  need  not  type  the  numbers  in  a  he 

interface  here  is  extremely  flexible.  The  numbers  can  be  typed  in  separated  by  comma  :n 

by  hyphens  if  one  wants  to  delete  a  range  of  patterns.  For  example,  to  delete  patter 
8,  7,  8,  9,  10,  and  23,  one  can  just  type  “1,  2,  !^-10,  23.” 

Another  nice  thing  about  these  predicates  is  that  they  will  not  stonewall:  For  examp  ' 
user  ask  for  patterns  of  type  T,  and  although  T  is  valid  pattern  type,  there  don’t  hap;  e 

any  patterns  of  that  type  recorded,  (e.g.,  if  the  user  asks  to  see  SVO  patterns,  but  th--  -o 

SVO  patterns  recorded),  a  message  will  be  printed,  warning  that 

There  are  no  patterns  of  type  svo  currently  recorded. 

However,  if  T  is  not  a  valid  pattern  type  at  all,  the  user  will  be  told  that  T  is  not  a  ,.at- 

tern  type,  and  a  message  will  be  shown  presenting  the  valid  pattern  types. 

One  last  aspect  of  deleting  patterns  involves  the  “abort”  answer  to  the  selectionai  .>rn 
query,  which  should  be  used  if  the  user  enters  an  incorrect  answer. 

After  answering  a  (for  abort),  the  following  will  happen:  First,  the  user  is  given  a  ■  u  iice  to 
undo  the  command  to  abort.  If  the  user  does  indeed  want  to  abort  parsing,  all  patt'‘riis  gen¬ 
erated  by  the  current  sentence  will  be  presented  (in  the  format  shown  above),  and  the  ’■  r  will 
be  given  a  chance  to  delete  any,  all,  or  none  of  them.  The  user  will  then  be  given  a  ch'  ,,ce  to 
erase  the  current  sentence  itself  (in  case  the  sentence  itself  was  incorrectly  entered),  ana  iinally, 
the  parsing  will  abort,  and  Prolog  will  return  to  the  top-level  prompt. 

Note  that  the  era8e_selection  family  of  predicates  will  only  affect  the  state  of  selection  in 
the  current  Prolog  session.  It  is  the  user’s  responsibility  to  make  appropriate  modihcations  to 
any  files  (e.g.,  SEI.ECTIONAL_FATTERKS.pl)  which  contain  the  selectional  data. 
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IS.S.  Editing  Patterns 


There  are  variations  of  all  of  the  predicates  described  above  which  can  be  used  to  edit  selec- 
tional  patterns.  This  mechanism  uses  the  Prolog  Structure  Editor.  Instead  of  typing 
check.selection  or  erase_selection,  just  type  edit_selectlon.  (This  predicate 
exists  only  in  a  2-argument  version). 

Calling 

edit. selection! word, W) . 
edit_selection(exact_type,T) . 
edit_selection( general.type ,T) . 
edit_8election( sent , S ) . 

will  present  all  the  relevant  patterns  (e.g.,  aU  patterns  containing  the  word  W),  and  ask  which 
ones,  if  any,  to  »dit.  The  acceptable  responses  are  the  same  as  those  for  deleting  patterns. 
Once  the  user  has  selected  which  patterns  to  edit,  the  mechanism  will  then  invoke  the  Prolog 
Structure  Editor  on  each  pattern  selected,  and  modify  the  selection  DB  accordingly. 

It  is  again  the  user’s  responsibility  to  make  appropriate  modifications  to  any  files  (e.g., 
SELECT10NAL_PATTERMS.pl)  which  contain  the  selectional  data. 

14.  Future  plana 

There  are  a  number  of  specific  areas  in  which  the  selection  module  needs  to  be  modified,  some  of 
which  have  been  noted  previously: 

•  Improving  the  treatment  of  anaphoric  and  elided  elements  (such  as 
<SOMEBODY/THING>  and  nulln)  in  selectional  patterns  to  allow  the  propagation  of 
attributes  deduced  by  selection. 

•  Extending  the  explanation  facility  (the  “r”  option)  to  the  word  level  selection  prompt. 

•  Optimising  the  matching  of  class-level  patterns  to  word-level  patterns.  Several 
approaches  have  been  considered.  One  suggestion  has  been  to  allow  uninstantiated  logic 
variables  to  be  part  of  patterns.  This  solution  has  been  partly  implemented,  but  certain 
problems  have  not  been  solved  concerning  how  to  index  on  patterns  containing  variables. 
The  approach  of  compiling  the  ita  hierarchy  directly  into  Prolog  unit  clauses  has  also 
been  tried.  The  result  was  a  noticeable  gain  in  execution  time,  but  at  the  cost  of  compil¬ 
ing  in  a  large  file  containing  approximately  1000  unit  clauses.  Another  technique  to  be 
considered  is  the  use  of  narrowing  or  feature  intersection  (a  la  LOGIN). 

•  Automatic  generalisation  or  success  for  certain  specific  patterns.  For  example,  any  part- 
whole  relation  in  a  noun/noun  pattern  such  as  [Mubmarine,hul[\  should  be  allowed  to 
automatically  succeed. 

•  New  names  should  be  used  for  the  patterns  because  the  slash  “/”  has  been  overloaded  in 
the  name  of  patterns,  since  it  denotes  conjunction,  modification,  and  siblinghood. 

•  The  ability  to  examine,  delete,  and  edit  all  patterns  of  a  certain  semantic  class  should  be 
added. 

•  The  selection  switches  should  be  be  incorporated  into  the  top-level  switches  mechanism. 
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PUNDIT’S  SYNTACTIC  COMPONENT 
DESCRIPTION  OF  COVERAGE 


Lynette  Hirschman 


1.  Introduction  to  String  Grammar 

This  document  will  provide  an  overview  of  PUNDIT’s  approach  to  syntax,  based  on  string 
grammar  (Z.  Harris  1868,  N.  Sager  1981).  Following  the  overview,  the  coverage  of  PUNDIT  is 
sketched  followed  by  a  subsection  providing  some  information  on  debugging  tools  and  strategies 
for  debugging  PUNDIT’s  grammar. 

PUNDIT’s  Restriction  Grammar  is  an  adaptation  cf  Sager’s  well-documented  Linguietie 
String  Grammar.  Since  our  approach  has  been  driven  by  the  need  to  cover  constructions  in  the 
particular  texts  we  were  dealing  with,  we  have  added  constructions  to  PUNDIT  as  needed. 
This  means  that  by  and  large  the  PUNDIT’s  coverage  of  standard  English  is  a  subset  of  the 
grammar  given  in  Sager’s  book,  although  PUNDIT  contains  additional  constructions  not  docu¬ 
mented  in  the  book,  such  as  an  extensive  treatment  of  sentential  fragments.  Also,  over  the 
years,  we  have  deviated  from  Sager’s  string  grammar  treatment.  Some  deviations  are  minor  (a 
more  uniform  treatment  of  modal  verbs)  and  some  are  major  (the  meta-rule  treatment  of  con¬ 
junction  and  of  wh-clauses).  These  will  be  discussed  in  later  subsections.  For  a  general  over¬ 
view  of  string  grammar,  Sager’s  book  remains  the  best  reference  work.  Many  of  the  "missing” 
constructions  in  PUNDIT  could  be  readily  added  by  consulting  Sager’s  treatment. 

One  of  the  major  extensions  to  Sager’s  system  is  the  use  of  regularisation  rules  with  each 
production  in  the  grammar.  These  rules  describe,  in  a  form  of  lambda  calculus  notation,  how  to 
combine  the  daughters  of  a  given  node  into  a  regularised  operator-operand  notation  that  nor¬ 
malizes  syntactic  relations  and  makes  explicit  many  of  the  gapped  elements.  The  output  of  the 
regularization  is  called  the  Intermediate  Syntactic  Representation  or  ISR.  It  is  the  ISR  that  is 
passed  on  to  semantics  and  selection,  since  it  is  far  more  regular  than  the  surface  syntactic 
analysis.  Howeve:  t’  .SR,  its  rules  and  its  mechanism  will  be  discussed  separately;  this  section 
will  focus  exclusively  on  syntactic  coverage. 


String  Grammar  Rules  and  Restrictions 

String  grammar  is  written  as  context-free  rewrite  rules,  in  the  form  of  BNF  definitions, 
augmented  by  restrictions,  which  are  constraints  on  the  well-formedness  of  the  (partial)  parse 
tree.  As  the  BNF  definitions  are  applied  in  string  grammar,  a  partial  parse  tree  is  built  up, 
corresponding  to  the  definitions  applied.  The  restrictions  are  interspersed  with  BNF  definition 
expansion,  and  check  to  see  that  the  parse  tree  constructed  so  far  is  well-formed.  Restrictions 
are  used  to  check  for  things  such  as  agreement  (subject-verb,  determiner-adjective-noun),  object 
subcategorization  (so  that  verbs  take  only  objects  that  they  are  subcategorized  for),  and  posi¬ 
tional  constraints.  These  are  the  well-formedness  restrictions,  which  fire  on  node  completion.  In 
addition,  there  are  a  number  of  optimization  restrictions,  that  check  to  see  if  the  pre-conditions 
for  a  particular  construction  hold.  These  are  "disqualify"  restrictions,  which  fire  before  node 
construction  begins.  A  (simplified)  grammar  rule  might  be  that  assertion  constructs  a  subject, 
followed  by  a  verb,  followed  by  execution  of  the  w_agree  well-formedness  restriction,  followed 
by  construction  of  the  object. 

assertion  ::=  subject,  verb,  •(w_agree},  object. 

Here  we  see  the  well-formedness  ("w_")  agreement  restriction,  firing  on  completion  of  the  verb 
node,  to  check  subject-verb  agreement. 


Strings  and  L  ! 

String  gr 
structions  and 
by  left  modifier 
its  head;  that 
like",  etc.  The 
X  stands  for  t; 
modifier  node;- 
Below  are  list- 
ical  classes)  a 
verb  and  verb 


For  verbs,  ii 
quickly.  Fc 
the  right  mo 
of  tomething) 
modifier,  in 
(apos)  and  / 
prepositional 

Iv  :  :  - 
rv  :  :  - 
la  :  :  - 
ra  :  ; 


In  :  :  - 
rn  :  ;  = 


The  otf, 

trie  construct; 
example,  an  o. 
ties.  A  string  i 
gatory,  excep’  ! 
tions  are: 


.  Vocles 

ii! -r  distinguishes  two  types  of  constructs:  Head/ adjttnet  (endocentric)  con- 
(oxocentric)  constructs.  An  endocentric  construction  has  a  head  flanked 
'  11,!  -ight  modifiers;  the  behavior  of  an  endocentric  construction  is  governed  by 
un  phrase  is  noua-like  in  its  behavior;  an  adjective  phrase  is  "adjective- 
Titric  constructions  are  called  Ixr  constructions  in  string  grammar,  where 
'  Banked  by  its  left  (1)  and  right  (r)  modifiers,  which  may  be  empty.  The 
liled  lx  and  rx  respectively  (where  X  is  the  head  of  the  construction). 
!:t-  nf  the  Important  Ixr  constructions  in  string  grammar;  terminal  nodes  (lex- 
ated  by  an  asterisk.  Note  that  the  basic  lexical  classes  (nouns,  adjectives, 
!■;■  lea)  have  associated  Ixr  constructions. 


-  In, 

nvar ,  rn . 

%  nvar 

=  noun  or  variant  (e.g.,  gerun 

-  la. 

*adj,  ra. 

%  adj 

=  adjective 

=  Iq, 

*q. rq. 

%  q  = 

quantity  word,  e.g.,  "two" 

=  Iv, 

*v,  rv. 

%  V  = 

infinitive  form  of  verb 

=  Iv, 

*tv,  rv . 

%  tv  = 

tensed  verb 

=  Iv, 

*ving,  rv. 

%  ving 

=  present  participle  of  verb 

=  Iv, 

*ven,  rv. 

%  ven 

=  past  participle  of  verb 

right  modifiers  (Iv, 

rv)  consist  of  slots  for  adverbials,  e.g.,  not  or 

•  ves,  the  left  modifier  la  consists  of  an  adverb  slot  for  adverbs  such  as  verb; 

'■'■t  ra  consists  of  a  list  of  options,  including  prepositional  phrase  (suspicious 
1. lial  complement  (certain  that  they  left)  and  adverbials.  For  nouns,  the  left 
.  f  slots  for  the  determiner  (tpoa  =  the  position),  quantity  (qpos),  adjectives 
Ji tiers  (npos).  The  right  noun  modifier  (rn)  consists  of  a  list  of  options  for 
relative  clause,  adjective,  appositive,  etc. 

.'ill.  %  *dstg  =  adverb  slot 

i-=tg:  null.  %  pn  =  preposition  +  noun  =  prepositional  phrase 

rrui  1  . 


%  that  +  sentence 


)OS  , 


apos,  npos. 


se ; 


important  construction  in  string  grammar  is  the  etring.  A  string  is  an  exocen- 
I,  that  is,  a  construction  whose  behavior  differs  from  that  of  its  constituents.  For 
m  rtion  cannot  be  considered  "verb-like"  or  "subject-like"  in  its  syntactic  proper- 
s  made  up  of  two  or  more  obligatory  elements;  the  elements  of  a  string  are  obli- 
'or  the  sa  or  sentence  adjunct  elements.  Examples  of  important  string  construc- 

surtion  ::=  sa,  subject,  sa,  Itvr,  sa,  object,  sa. 

%  sa  =  sentential  adjunct 
%  prepositional  phrase 

%  e . g. ,  "two  foot" 


*p,  nstgo. 
Iqr ,  *n . 


vingo  : :=  Ivingr,  sa,  object.  %  e.g.,  "reading  a  bock" 

VO  Ivr,  sa,  object.  %  e.g.  “read  a  book" 

venpass  : ;=  Ivenr,  sa,  passobj .  %  passive,  "seen  by  me" 
tovo  ; :=  to,  Ivr,  sa,  object.  %  e.g,  "to  read  a  book" 

Strings  include  basic  constructions  such  as  assertion,  question,  and  imperative.  The  prepositional 
phrase  (pn)  is  also  a  string,  since  its  behaviour  is  neither  that  of  a  preposition  nor  that  of  a 
noun  (in  fact,  it  is  often  adverbial). 

The  philosophy  of  string  grammar  is  to  include  a  slot  for  each  element;  the  realization  of 
that  element  may  be  the  empty  string  if  the  element  is  optional  (e.g.,  adjuncts),  or  if  it  has  been 
"seroed"  (reduced  to  zero  or  null)  for  some  other  reason  (e.g.,  gapping).  The  advantage  to  this 
approach  is  that  the  skeletal  parse  tree  is  very  regular.  For  example,  an  assertion  always  con¬ 
tains  nodes  for  subject,  verb,  and  object,  separated  by  sentence  adjunct  slots.  However,  many  of 
those  nodes  may  be  empty  (including  the  object  can  be  realized  as  nullobj  for  an  intransitive 
verb).  The  adherence  to  this  philosophy  reduces  the  number  of  grammar  rules  and  makes  for 
efficient  top-down  parsing,  but  also  makes  for  bushy  trees  with  many  empty  nodes. 

Object  Options  In  String  Grammar 

One  large  group  of  strings  is  the  class  of  objects.  String  grammar  handles  auxiliaries  as 
instances  of  verb  +  complex  object.  This  gives  a  very  regular,  recursive  structure  to  the  object 
node  in  string  grammar.  At  the  top  level,  we  have  the  tensed  verb,  followed  by  an  object.  If 
the  tensed  verb  is  a  modal,  its  object  will  be  vo  —  an  infinitive  followed  by  object,  e.g.,  /  may 
read  the  book.  If  the  teased  verb  is  be,  the  object  may  be  the  participial  object  vingo,  e.g.,  I  am 
reading  a  book,  etc.  This  means  that  objects  carry  a  great  deal  of  information,  and  may  often 
contain  the  "meaning  bearing"  verb,  where  there  are  auxiliaries,  as  in  It  may  have  to  be 
reviewed,  where  review  is  the  meaning  bearing  verb,  embedded  in  successive  objects,  as  follows: 

assertion 

I 

subject  sa  Itvr  sa  object 

I  I  I 

IT  MAY  VO 

I 

Ivr  sa  object 

I  I 

HAVE  tovo 

I 

to  verb  sa  object 

I  I  I 

TO  BE  venpass 

I 

Ivenr  sa  passobj 
1  1 
REVIEWED  nullobj 

Note  that  at  the  "bottom"  of  this  construction  is  the  node  nullobj.  This  indicates  absence  of  an 
overt  object.  It  is  used  to  fill  the  object  slot  on  intransitive  verbs  and  also  empty  object  slot  in 
the  passive  object  passobj,  as  in  the  tree  above. 

Strings,  LXRs  and  Disjunctive  Rules 

In  general,  there  are  three  basic  types  of  rules  in  string  grammar:  bcr  constructions,  string 
constructions,  and  disjunctive  rules.  A  disjunctive  rule  consists  of  a  series  of  single-element 


choices.  For  example,  the  object  rule  is  a  disjunctive  rule,  na’^ing  all  the  possible  object 
options,  separated  by  semi'Colons  (indicating  disjunction).  The  ordering  of  disjunctions  in  a  rule 
will  affect  which  parse  is  found  first,  since  options  are  applied  in  order.  If  the  grammar  is  used 
with  the  assumption  that  the  first  parse  will  be  the  one  used,  then  ordering  of  options  can 
become  important.  However,  if  the  system  is  allowed  to  run  to  all  parses,  then  each  option  will 
eventually  be  tried. 

object  : : = 

nstgo;  %  noun  string  object 

vingo;  %  present  participle  +  object 

vo;  %  infinitive  +  object 

venpass;%  passive  object 

tovo;  %  to  +  verb  +  object 

The  expansion  of  the  nvar  rule  is  also  a  disjunctive  rule: 

Inr  : :=  In,  nvar,  rn. 

nvar  :;=  *n;  %  noun 

namestg;  %  proper  name  construction 

*ving;  %  gerund 

nulln.  %  empty  head,  e.g.,  "the  few  (nulln)  are  here” 

By  contrast,  any  rule  that  has  multiple  required  elements  (indicated  by  ",")  is  either  a  string,  or 
it  expands  an  Ixr  node  into  left  adjunct  +  head  +  right  adjunct,  or  it  involves  punctuation.  In 
general,  rules  do  not  mix  options  (disjunction)  and  required  elements  (conjunction).  There  are, 
of  course,  a  number  of  exceptions  to  this  principle,  but  it  is  an  important  one  to  follow  when 
writing  grammar  rules,  since  it  preserves  clarity  and  maintains  the  necessary  separation 
between  string  definitions,  Ixr  definitions,  and  dbjunctive  rules.  For  the  conjunction  meta-rule  to 
work  properly,  for  example,  it  is  important  to  identify  string  and  Ixr  type  definitions. 

Empty  Elements 

One  of  the  unusual  features  of  string  grammar  is  the  proliferation  of  empty  elements. 
Since  adjunct  slots  are  included  as  part  of  the  basic  node  definitions,  the  result  is  that  these  are 
often  unfilled  (indicated  by  a  null)  element.  There  is  also  nullobj,  which  indicates  an  empty 
verb  object  for  intransitive  verbs.  In  addition  to  these,  there  are  many  other  flavors  of  empty 
elements  which  carry  important  information  for  construction  and  regularisation  of  the  surface 
syntax.  There  is  the  nulln  filler  for  nvar,  as  in  thete  three  are  mieeing.  There  are  several  kinds 
of  null  elements  associated  with  fragmentary  input;  there  is  a  special  kind  of  null  (nullc)  for 
handli'.'ig  gapping  in  conjunction,  and  yet  another  (nullwh)  for  handling  gapping  in  wh- 
constructions.  Being  to  distinguish  the  kind  of  empty  string  found  in  a  given  location  aids  the 
later  regularisation  and  semantic  phases  in  reconstruction  of  the  missing  information. 

Meta- Rules*  Conjunction  and  Wh 

One  of  the  major  departures  of  PUNDIT’s  Restriction  Grammar  from  Sager’s  string  gram¬ 
mar  is  PUNDIT’s  use  of  meta-rules  to  capture  certain  high-level  regularities.  The  conjunction 
meta-rule  mechanism  is  installed  in  the  current  PUNDIT  system.  It  operates  on  the  set  of  BNF 
definitions  (without  conjunction)  and  produces  a  new  set  of  grammar  rules  which  cover  most 
cases  of  conjoining  and  gapping  under  conjunction.  The  meta-rule  expands  each  node  of  type 
string  or  Ixr  to  include,  as  one  option,  a  conjunction  followed  by  a  recursive  call  to  the  rule. 
Thus  the  expansion  for  Inr  (simplified)  is: 

(Inr  : In,  nvar,  rn)  -> 

(Inr  ::=  In,  nvar,  rn; 


In,  nvar,  conj_wd,  Inr) . 

Thus  the  Inr  node  can  either  be  expanded  as  usual,  or  it  can  invoke  the  conjunction  option, 
which  has  a  conjunction  word  followed  by  a  recursive  call  to  Inr.  (In  actuality,  the  rule  is  writ¬ 
ten  more  efficiently,  so  that  the  In+nvar+rn  does  not  have  to  be  rebuilt  if  there  is  a  conjunc¬ 
tion.)  Thus  BNF  definitions  can  be  written  without  .worrying  about  conjunction,  as  long  as 
nodes  are  properly  classified  as  Ixr  or  string  node'.  The  meta-rule  component  is  then  applied 
to  generate  automatically  the  correct  rules  to  support  optional  conjoining. 

Wh-constructions  (relative  clauses,  questions,  indirect  questions,  reduced  relatives)  are  also 
be  handled  by  meta-rules.  Here,  the  function  of  the  meta-rule  is  be  to  introduce  parameters 
into  each  definition,  so  that  gap  information  can  be  passed  around,  namely  the  need  for  a  gap, 
or  the  fact  that  a  gap  has  been  found.  This  makes  the  handling  of  wh-constructions  invisible  to 
the  grammar  writer,  who  need  only  worry  about  routine  constructions.  The  treatment  of  wh- 
constructions  combines  in  a  very  natural  way  with  the  meta  rule  treatment  of  conjunction. 

Naming  Conventions 

String  grammar  has  a  fairly  mnemonic  set  of  naming  conventions,  once  you  get  used  to  it. 
For  example,  objects  are  named  by  their  components,  e.g.,  tovo  =  TO  +  Vtrb  +  Object,  or  pn 
=  Prepoeition  +  Noun.  Somewhat  confusing  is  the  stg  suffix,  as  in  nstg,  astg,  dstg.  Although 
atg  stands  for  etring,  in  fact  NONE  of  the  things  named  by  stg  are  strings.  They  are  Ixr  con¬ 
structions.  Once  you  get  past  that  basic  confusion  (of  unknown  historical  origin),  the  names  are 
fairly  logical. 

Type  Lists 

Since  there  are  certain  generalisations  associated  with  strings  and  Ixr  nodes,  these  are 
captured  by  type  lists,  which  allow  the  grammar  writer  to  define  typer,  and  then  to  use  the  asso¬ 
ciated  type  names  in  writing  restrictions.  For  example,  there  is  a  type  Ixr,  a  type  lx,  a  typerx, 
and  a  type  string.  The  type  Ixr  nodes  have  an  operation  on  them  called  core,  which  goes  to 
the  head  of  the  Ixr  construction;  this  operation  is  used  in  restrictions,  which  often  state  con¬ 
straints  between  heads  of  syntactic  constructions,  e.g.,  between  the  head  of  the  subject  and  the 
head  (tensed  verb)  of  the  Itvr  node  for  subject-verb  agreement.  Both  the  lx  and  the  rx  nodes 
belong  to  a  broader  type,  the  adjunct  type.  Adjuncts  can  typically  be  empty,  the  adjunct  slot 
is  the  string  grammar  mechanism  for  allowing  optional  elements. 

2.  Coverage  of  PUNDIT’s  Grammar 

The  subsection  will  summarise  the  current  state  of  PUNDIT’s  coverage.  As  mentioned  in 
the  introduction  to  thb  section,  coverage  has  very  much  been  driven  by  the  needs  of  the  partic¬ 
ular  domains  we  have  processed.  As  a  result,  it  is  somewhat  uneven,  although  quite  broad. 

Noun  Phrases 

Coverage  of  noun  phrases  is  generally  very  good.  It  includes  treatment  of  complex  pre- 
nominal  modifiers:  multiple  nouns,  adjectives,  qn  expressions  such  as  a  two-foot  deep  hole,  and 
nq  expressions,  such  as  the  number  2  pump.  Nominalisations  are  handled  as  ordinary  noun 
phrases  in  the  syntax,  so  they  are  covered  and  later  converted  by  semantics  to  capture  the 
underlying  verb  semantics.  A  wide  range  of  post-nominal  expressions  are  also  covered,  including 
multiple  prepositional  phrases,  participial  expressions  (the  book  read  by  the  students,  the  person 
running  the  race),  adjective  expressions  (the  student  prr.ten*  for  the  er/ifn),  ttiaU 

parenthetical  expressions  (Florence  Joyner,  the  Olympic  athlete,  and  my  PC  (the  one  I  bought  a 


month  ago).  Relative  clause  coverage  has  been  greatly  expanded  with  the  introduction  of  the 
new  wh-module  and  includes  both  standard  relative  clauses,  and  sero-complementizer  relatives 
[the  person  I  taw).  Pronouns  are  handled  by  a  separate  ipror  option  for  the  noun  phrase;  this  is 
done  because  pronoun  take  a  highly  restricted  set  of  left  and  right  adjuncts,  compared  to  nouns. 

Adjective  Phrases 

Coverage  of  adjective  phrases,  in  pre-nominal  position,  predicative  position  and  verb  com¬ 
plement  position  is  extensive.  In  predicative  and  verb  complement  positions,  adjectives  can  take 
complex  right  modifiers,  including  prepositional  phrases  [certain  of  a  fact)  and  a  variety  of 
clausal  complements  (eertam  that  they  came,  certain  to  come).  In  the  left  adjunct  slot,  adjec¬ 
tives  can  be  modified  by  adverbs,  e.g,  very  certain. 

Adverblals 

The  coverage  of  adverbials  in  PUNDIT  includes  left  and  right  modifiers  and  a  recursive 
definition  (e.g.,  for  very  long). 

Verb  and  Verb  Complements 

Our  current  grammar  includes  more  than  forty  classes  of  verb  complement  (object).  Selec¬ 
tion  of  the  appropriate  complement  set  is  controlled  by  a  pruning  mechanism  that  takes  the 
intersection  of  the  verb’s  subcategorization  constraints  (given  in  the  objlitt  for  the  verb  entry  in 
the  lexicon)  with  the  set  of  object  options.  Classes  of  complement  types  include: 

direct  object, 
ditransitive, 

objects  of  auxiliary  verbs: 

VO  [I  may  read  the  book); 
vingo  (/  am  reading  the  book); 
veno  (/  have  finished  the  book); 
venpass  [She  was  given  the  booh); 
objects  of  be  and  other  copulative  verbs; 

objbe  [They  are  here) at  home;  they  remain  leaders), 
direct  object  4-  prepositional  phrase, 
particle  +  various  object  types, 

(e.g.,  close  up,  close  up  the  store,  close  the  store  up), 
clausal  objects 

(e.g.,  I  said  that  I  would  come;  it  seemed  to  be  raining), 
equi-verb  objects 

(e.g.,  I  wanted  to  go). 
small  clauses 

(e.g.,  they  painted  the  house  red). 

Each  of  these  object  options  has  a  regularization  rule  associated  with  it  that  allows  correct 
reconstruction  of  the  underlying  semantics,  including  correct  handling  of  subject/object  control 
issues.  This  is  done  by  the  Intermediate  Syntactic  Regularization  component  and  will  not  be 
further  discussed  here;  see  the  PUNDIT  Guide  to  Verb  Objects  for  more  complete  documenta¬ 
tion  of  pundit’s  object  options.  One  respect  in  which  PUNDIT’s  treatment  of  object  differs 
from  string  grammar  is  in  a  uniform  treatment  of  modals,  which  siirply  take  the  object  option 
VO,  na’^'^ly  'TiSnitiv**  verb  4-  object. 


Sentential  Adjuncts 

The  grammar  covers  a  variety  of  sentential  adjuncts,  including  adverbial  modifiers 
(adverbs  and  prepositional  phrases),  purpose  clauses  (/  did  it  to  win),  and  a  range  of  subordinate 
clauses  [until  finithed;  before  they  came;  after  running  the  race).  It  now  also  covers  a  class  of 
adverbial  phrases  consisting  of  a  lone  noun  phrase.  In  normal  English,  this  includes  time  expres¬ 
sions,  e.g.,  I  left  last  week.  Also  needed  for  message  texts  is  a  similar  location  adverbial  construc¬ 
tion,  such  as  lesion  right  lung,  where  right  lung  is  a  locative  phrase  without  a  preposition.  Both 
of  the  require  strong  selectional  or  semantic  constraints,  in  order  to  avoid  taking  almost  any 
noun  phrase  in  any  adjunct  slot.  Not  included  yet  are  right-dislocated  relative  clauses  (the  per¬ 
son  came  whom  I  wanted  to  meet). 


Conjunction 

The  conjunction  meta-rule  component  generates  rules  to  handle  conjunction  from  the  basic 
BNF  definitions.  Conjoining  is  allowed  only  at  Ixr  and  string  type  nodes,  which  eliminates  some 
of  the  spurious  ambiguity  that  can  be  associated  with  treatments  of  conjunction.  The  current 
mechanism  handles  a  variety  of  conjunctions  [and,  or,  but),  paired  constructions  [both.. .and, 
neither... nor)  and  "comma-conjunction”  (use  of  comma  to  take  the  place  of  an  explicit  conjunc¬ 
tion  in  a  list  such  as  apples,  oranges  and  pears).  Since  the  meta-rule  generates  a  recursive 
definition,  an  arbitrarily  long  series  of  conjunctions  ran  be  handled. 

Inr  ::=  In,  nvar,  rn 
=> 

Inr  ::=  In,  nvar,  rn; 

In,  nvar,  rn,  conj.wd,  Inr. 

In  addition,  the  meta-rule  component  allows  for  gapping  under  conjunction.  In  particular, 
it  can  handle  gapped  subject,  gapped  object,  and  gapped  verbs,  as  follows; 

I  mixed  up  the  batter  and  baked  the  cookies. 

I  cooked  and  they  ate  the  cookies. 

I  baked  the  cookies  and  Robin  the  cake. 

At  the  moment,  there  are  certain  constructions  that  are  not  handled  by  the  current  conjunction 
mechanism.  One  problem  is  that  conjunction  requires  homogeneity  —  only  like  objects  can  be 
conjoined,  for  example.  Thus  PUNDIT  cannot  parse  the  construction  my  friends  and  /  because 
the  first  conjunct  is  Inr  and  the  second  is  Ipror.  Also,  certain  kinds  of  partially  gapped  objects 
are  not  handled,  e.g.  they  broke  through  and  demolished  the  plate  glass  window,  has  a  gap  in  the 
first  conjunct  that  is  embedded  in  the  prepositional  phrase  object,  following  the  the  preposition 
through.  Thus  the  object  is  partially  gapped  ~  which  is  not  currently  handled. 

Wh-  Constructions 

The  new  meta-rule  component  for  wh-constructions  now  covers  questions,  relative  clauses 
and  indirect  questions  [I  don’t  know  what  they  want).  We  plan  to  extend  it  shortly  to  cover  head¬ 
less  relative  constructions  (Vv7iauiji  you  need  is  here)  as  well.  Ii  supports  the  interaction 
between  conjunction  (and  its  gaps)  and  the  wh-constructions  (and  their  gaps). 


Fragments 

Because  much  of  our  work  has  been  focused  on  message  trafiBc,  PUNDIT  supports  a 
comprehensive,  elegant  treatment  of  fragmentary  and  run-on  sentences  that  are  characteristic 
of  message  text.  There  are  five  basic  fragment  types,  including  fragments  for  missing  subject 
(tvot  wot  repaired),  missing  verb  (lero.copulat  disk  bad;  disk  repaired),  missing  subject  and 
verb  (predicatet  broken  since  yesterday),  missing  object  (engineer  repaired),  and  noun  phrase 
fragment  (n8tg_frag:  bad  drive).  Other  recently  added  center  string  rules  include  rules  for 
response  fragments,  necessary  to  handle  certain  kinds  of  question/answer  interchanges,  e.g..  Are 
you  going?  Yes. 

2.1.  Debugging  Tools  and  Advice 

There  are  a  few  tools  that  are  useful  in  debugging  parses  that  either  fail  or  are  incorrect. 
First,  the  grammar  may  be  called  on  any  constituent,  not  just  sentences.  For  example,  to  see  if 
something  parses  as  a  noun  phrase,  the  parser  can  be  called  via  parse(n8tg),  which  will  prompt 
for  input,  and  will  produce  parses  of  all  substrings  of  the  input  that  can  be  parsed  as  a  noun 
phrase.  This  is  often  useful  in  a  divide-and-conquer  approach  to  debugging,  which  each  phrase 
can  be  checked  for  its  parse. 

The  grammar  can  be  run  in  two  modes:  interpreted  and  translated.  When  run  inter¬ 
preted,  grammar  rules  are  applied  as  data  structures;  they  do  not  constitute  Prolog  procedures, 
so  the  normal  Prolog  spy  mechanism  does  not  work.  However,  restrictions  can  be  spied  on,  even 
in  interpreted  mode.  This  is  often  useful  to  get  a  sense  of  how  far  the  grammar  has  gotten,  e.g, 
if  you  spy  on  w_agree,  which  follows  completion  of  the  verb,  you  know  that  the  verb  has  been 
built.  Since  the  parse  tree  is  passed  as  a  parameter  to  each  restriction,  spying  on  a  restriction 
also  gives  you  the  current  (pretty-printed)  partial  parse  tree,  which  can  be  useful.  The  inter¬ 
preted  mode  has  a  grind  mechanism  available,  which  allows  the  user  to  specify  a  list  of 
definitions,  or  all  definitions,  to  be  printed  cut  each  time  they  are  applied  in  generating  a  parse. 
This  is  not  interactive,  but  can  be  instructive  if  one  has  the  patience  to  follow  each  application 
of  a  set  of  rules. 

When  running  in  translated  mode,  all  grammar  definitions  are  translated  into  procedure 
calls.  Thus  any  definition  can  be  traced  via  the  normal  spy  mechanism.  Again,  the  parse  tree  is 
present  as  a  parameter,  so  you  will  also  be  able  to  see  the  tree.  This  is  very  convenient  for 
debugging. 

In  addition  to  these  limited  tools,  there  are  general  strategies  for  debugging.  Step  one  is  to 
make  sure  that  you  are  running  with  selection  turned  off  and  with  semantics  turned  off.  Either 
of  these  can  cause  unannounced  failures  when  turned  on.  (They  are  turned  on  and  off  via  the 
switches  mechanism  ~  see  the  PUNDIT  User  Guide  for  information  on  switches.  Step  two  is  to 
either  simplify  the  sentence  or  to  try  the  divide  and  conquer  method,  parsing  constituents  one  at 
a  time.  The  idea  is  to  find  the  problem  area  in  the  sentence  and  to  be  able  to  reproduce  the 
bug  on  a  minimal  structure.  Once  you  know  approximately  in  what  construction  the  bug  is 
occurring,  you  can  either  try  spying  selected  restrictions  (and  definitions,  if  running  translated) 
or  try  the  brute  force  method  of  grinding.  At  this  point,  however,  it  should  be  emphasized  that 
debugging  the  grammar  remains  an  art,  rather  than  a  science. 


ANNOTATED  ALPHABETIC  LISTING  OF  BNF  DEFINITIONS  Adapted  from  Sager,  Natural 
Language  Information  Processing,  pp.  310-321  With  additional  annotations  for  PUNDIT  usage. 
**  assembled  9/88  by  Lynette  Hirschman;  updated  9/89 

A************************************************************************  Annotations* 

#  indicates  NOT  in  current  PUNDIT  system 
$  indicates  in  PUNDIT,  but  NOT  in  Sager’s  book. 

I  indicates  significant  difference  in  PUNDIT  from  Sager’s  treatment. 

***********«******«*«****«***********«*«««*****«*««•«***********«*«*«***«** 


adjadj  recursive  definition  of  pre-nominal  ADJectives 
defined  as:  {d_adjadj},larl,  (adjadj;*); 

Iqnr,  (adjadj;") 

!  adjinrn  ADJective  IN  RN  (right  adjuncts  of  the  noun) 

PUNDIT  handles  by  astg  option  in  rn. 

#  adjn  ADJective  -f-  n  (noun  phrase);  permutation  of  object  option  na, 

as  in  "painted  red  the  house  which  I  saw  last  week" 

!  adjpreq  ADJective  Pre  (i.e.  before)  Q  (quantifier), 

handled  as  q  option  in  Iq  (left-quantifier  adjunct). 

#  andstg  and  string,  to  handle  conjunction. 

PUNDIT  handles  conjunction  differently,  via  metarule. 

#  and-orstg  and/cr  string  for  conjunction 

PUNDIT  handles  conjunction  differently,  via  metarule. 

apos  Adjective  POSition  of  the  ordered  left  adjuncts  of  a  noun 
defined  as:  adjadj;  null. 

!  appos  Appositive  (in  right  noun  adjunct  slot) 

Differs  from  Sager  in  support  of  parens,  explicit  punctuation, 
defined  as:  [,],n8tg,  ([,];{w_endmark}); 

((],nstg,[)] 

#  asobjbe  AS  +  OBJect  of  BE,  e.g.,  "they  served  as  messengers",  or  in 

passive  object  option,  as  in:  "she  was  considered  as  a  candidate". 

$  assert_frag 

assertion  -t-  fragment  —  type  of  center  string 
defined  as:  as8ertion,internal_punct,{vso_selection}; 
fragment, in  ternaLpunct,{vso_Belection} 

assertion  subject  +  tense  -f-  verb  +  object,  with  optional  sentence 
adjuncts  between  these  dements, 
defined  as:  8a,8ubject,sa,ltvr,{wagree},sa, object, sa 

#  assertions 

null  assertion  -f  sentence  adjunct,  for  e.g.,  "they  ran  and  fast". 


#  asstg  as  string  for  comparative; 

PUNDIT  does  not  yet  have  a  real  treatment  of  comparative, 
but  currently  handles  certain  constructions  via  the 
conjunction  metarules. 

!  astg  Adjective  string  (not  really  a  string!),  for  predicate  adjective 
or  adjectives  in  complement  constructions, 
in  PUNDIT,  only  defined  as  lar,  not  Iqnr. 
defined  as:  lar. 

#  as-well-as-stg 

AS  WELL  AS  STrinG,  for  conjunction. 

PUNDIT  handles  conjunction  via  me’  vrule. 

avar  Adjective  VARiant,  containing  head  of  adjective  construct, 
defined  as:  lcda,*adj 
lcda,*ving 

lcda,{d_ven_avaT},*ven 

$  be_aux  BEl-AUXiliaries  —  possible  objects  following  the  verb  be: 
defined  as:  vingo; 
venpass; 
tovo 

#  beingo  BEING_  Object  (as  object  of  be,  e.g.,  "He  is  being  difficult" 

PUNDIT  handles  as  be_aux  object  type. 

#  bothstg  both  string,  for  conjunction. 

PUNDIT  handles  conjunction  via  meta-rule. 

#  butstg  but  string 

PUNDIT  handles  conjunction  via  meta-rule. 

!  center  center  string  of  sentence 

differs  from  Sager  in  addition  of  fragment,  compound  options, 
defined  as:  {dquest2}, assertion, {w_endmark},{v80_selection}; 
(dquestl), question, {w_endmark},{v80_selection}; 
imperative, {w_endmark},{vso_j5election}; 
fragment, {w_endmark},{v8o_selection}; 
compound, {vso_selection} 

^  commastg  comma  string,  used  for  conjunction. 

PUNDIT  handles  conjunction  via  meta-rule. 

$  commaopt  COMMA  OPTion,  consisting  of  comma  or  null, 
defined  as;  ,;  null. 

$  compound  COMPOUND  center,  consisting  of  recursive  def.  of  assertion  or 
fragment,  followed  by  center,  or  of  a  runon  sentence, 
defined  as:  assert_frag,center;runon 

#  compar  Comparative  complement  (e.g..  It  is  so  old  that  it  is 

decaying.)  PUNDIT  has  no  treatment  of  comparative  at  this  time. 


#  cpdnumbr  Compound  number  (e.g.,  one  hundred) 

#  csstg  CS  (subordinate  conjunction)  STrinG  list  of  options  in  sa. 

PUNDIT  handles  subordinate  clauses  as  explicit  options. 

c'.should  Subjunctive  form  of  assertion,  using  untensed  verb 
defined  as:  [that], subject, 8a,lvr,8a, object, sa 

#  dashstg  dash  string,  for  conjunction 

PUNDIT  handles  conjunction  via  metarule. 

#  dateprep  date  preposition  (e.g.,  on,  in,  until,  since,  etc.) 

PUNDIT  handles  date  as  a  special  form  of  "namestg" 
in  nvar;  dates  themselves  are  handled  via  the  "shapes” 
component. 

#  dayyear  Various  forma  of  date 

PUNDIT  handles  numerical  dates  via  the  "shapes”  component. 

dpsn  Particle  (e.g.,  up,  out)  +  sn  (embedded  sentence),  (e.g..  He 
found  out  that  we  went.) 

dpi  Particle  (e.g.,  carry  on,  find  out),  occurs  as  object  option 

defined  as:  {d_dpval},*dp 

dp2  dp  (particle)  +  n  (noun  phrase),  occurs  as  object  option, 

defined  as:  {d_dpval},*dp,nstgo 

dp3  n  (noun  phrase)  dp  (particle),  occurs  as  object  option, 

defined  as:  {d_dpval},nstgo,*dp 

#  dp4  of-permutation  of  dp3,  e.g.,  "the  splitting  up  of  the  project” 

dplp  dpi  (particle)  +  p  (preposition),  occurs  as  passive  object  option 
defined  as:  dpl,p. 

dplpn  dpi  (particle  +  pn  (prepositional  phrase) 
defined  as:  dpl,pn 

dp2pn  dp2  (particle  4-  noun  phrase)  +  pn  (prepositional  phrase) 
defined  as:  dp2,pn 

dp3pn  dp3  (noun  phrase  particle)  +  pn  (prepositional  phrase) 
defined  as:  dp3,pn 

#  dp4pn  of-permutation  of  dp3  +  pn  (prepositional  phase) 

!  dstg  aDverb  string  (which  is  not  really  a  string,  however). 

in  Sager,  defined  recursively,  but  not  in  PUNDIT  for  now. 
defined  as:  Idr. 

#  eitherstg  either  string,  for  conjunction 

PUNDIT  handles  conjunction  via  meta-rule. 


I  embeddedq  EMBEDDED  Question 

handled  as  snwh  option  in  object. 

!  endmark  Punctuation  at  end  of  center  string  of  sentence 

PUNDIT  lists  explicit  options  of  ,  "!'*  and  "?"  at  sentence 
and  and  connecting  centers  internally. 

$  eqtovo  The  EQui  form  of  TOVO,  where  implicit  subject  is  same 
as  the  matrix  verb  subject,  e.g.,  "I  want  to  go", 
as  opposed  to  the  tovo  option,  "the  pump  seems  to  fail", 
where  overt  subject  is  not  really  subject  of  matrix  verb, 
defined  as:  [to],vo 

#  especially-stg 

Especially  string,  for  conjunction. 

PUNDIT  handles  conjunction  via  meta-rule 

fortovo  FOR  -4-  subject  -(-  TO  +  Verb  (infinitive)  -f-  Object  (e.g.. 

For  John  to  see  her  b  important). 

#  fortovo-n  for  -f  to  -+-  Verb  -i-  Object  (less  one  noun  phrase  in  Object, 

e.g.,  the  person  for  John  to  see),  used  for  wh-gaps. 

PUNDIT  could  handle  wh-gaps  via  meta-rule,  although  thb 
construction  b  not  yet  handled  in  current  meta-rule  treatment. 

!  fraction  Fraction;  PUNDIT  handles  via  "shapes"  component  and 
via  fraction.q  definition  in  qvar. 

3  fragment  option  of  center,  used  to  parse  fragmentary  constructions 
defined  as:  tvo; 

■erocopula; 

n8tg_frag; 

objbe_frag 


howqastg  HOW  -f  Quantifier  (much,  many)  or  Adjective  +  [of]  + 

article  STrinG  (e.g.,  how  much  of  the  cake,  how  good  an  argument) 
defined  as:  how,([much];[many]),  ([of],*t;null) 

!  howqstg  HOW  +  Quantifier  (much,  many)  STrinG. 

Handled  as  option  of  dstg  creating  a  wh-construction. 


imperative 


Imperative  sentence  in  center  string 
defined  as:  sa,  vo. 


3  internaLpunct 

internal  punctuation,  separating  elements  in  assertj'rag  def. 
defined  as: 


#  introducer 

Pre-Center  connective  to  preceding  sentence  (e.g.,  and,  or, 
nor,  for) 


Not  yet  in  PUNDIT. 


la  Left  adjunct  of  Adjective 

defined  as:  null;{d_dla},*d 

lar  Left  adjunct  of  adjective  (optional)  +  Adjective  +  Right 

adjunct  of  adjective  (optional) 
defined  as:  la,*adj,ra 

I  lari  lar  with  limited  Right  adjuncts,  as  it  occurs  to  the  left 
of  a  noun; 

in  Sager,  defined  as  usual  bcr,  with  ral  "enough"  or  null 
defined  as:  la,avar 

#  Idate  Left  adjunct  of  date 

Dates  in  PUNDIT  handled  by  shapes  component. 

#  Idater  Left  adjunct  of  date  +  Date  +  right  adjunct  of  date 

Date.'^  in  PUNDIT  handled  by  shapes  component. 

Icda  Left  part  of  Compound  Adjective 
defined  as:  null;{d_Jcda},*n 

#  Icdn  Left  part  of  Compound  Noun 

Not  in  PUNDIT 

#  Icdva  Left  part  of  Compound  Verbal  Adjective  (e.g.,  a  hog  rabing 

farm) 

#  Ics  Left  adjunct  of  cs  (subordinate  conjunction) 

3  Id  Left  adjunct  of  aDverb,  captures  recursion  in  adverb 
defined  as:  null;  {d_two_dstgs},  dstg. 

$  Idr  Left  adjuct  of  aDverb  +  aDverb  +  Right  adjunct 
adverb  defined  (recursively)  as  Ixr  construction, 
defined  as:  Id,  *d,  rd. 

I  In  Left  adjunct  of  the  Noun 

PUNDIT  omits  nspos  position  of  Sager, 
defined  as:  tpos,qpo8,apos,npo8,{w_np_agree} 

#  Iname  Left  adjunct  of  a  Name  (e.g..  Dr.  Jones) 

!  Inamer  Left  adjunct  of  a  name  +  Name  +  Right  adjunct  of  name 
defined  as:  Iname,  ‘proper,  rname. 

Iname  Left  adjunct  of  NAME 
defined  as:  ‘title;null. 

#  Inamesr  Left  adjunct  of  name  +  possessive  form  of  name 


Inr 


Left  adjuncts  of  the  noun  +  n  (noun)  +  Right  adjuncts 


Insr 

Ip 

Ipro 

iq 

Iqnr 

Iqr 

It 

Itr 

Itvr 

Iv 

Ivenr 

Ivingr 

Ivr 

#  Ivsa 

#  Iw 


of  the  noun 

defined  as:  ln,nvar,{w_noun_agree},rn,{''v_ving_]nr} 

Left  adjuncts  of  the  Noun  ’S  (possession-case  noun)  -i- 
limited  Right  adjuncts  of  the  noun 
defined  as:  {ln,*ns,{w_noun_agree});(whose). 

Left  adjunct  (e.g.,  adverb)  of  Preposition 
defined  as:  {d_lp},dstg;qn;null 

Left  adjunct  (e.g,  Adverb)  of  Pronoun  (e.g.,  only  he) 
defined  as:  null;{d_dltpro},dstg 

Left  adjunct  of  Quantifier 
defined  as:  *adj,  {w_adj_pre_q};{d_dlq},*d;  null 

Left  adjunct  of  Quantifier  +  Noun  string  +  Right  adjunct 
defined  as:  lq,qnpos,rq 

Left  adjunct  of  quantifier  +  Quantifier  +  Right  adjunct 
defined  as:  lq,*q,(w_scope},rq 

Left  adjunct  of  t  (determiner),  e.g.,  "all''  in  "all  the" 
defined  as:  *d;  lqr,(w_pre_tpos};null. 

Left  adjunct  of  t  t  (determiner)  +  Right  adjunct  of  t 
defined  as:  It,  *t. 

Left  adjunct  of  tensed  verb  +  Tensed  form  of  Verb  -I-  Right  adjunct 
defined  as:  lv,*tv,rv 

Left  adjunct  of  V  (verb) 
defined  as:  null;  {d_lv},*d 

Left  adjunct  of  verb  +  VEN  (past  participle  of  verb)  +  Right 
adjunct  of  verb 

defined  as:  lv,*ven,rv 

Left  adjunct  of  v  +  VING  (-ing  form  of  verb)  +  Right  adjunct 
defined  as:  lv,*ving,rv 

Left  adjunct  of  v  +  verb  (infinitive)  +  Right  adjunct  of  v 
defined  as:  lv,vvar,rv 

Sentence  Adjunct  occurring  to  the  Left  of  ving  or  ven  in 
the  adjunct  strings  vingo  and  venpass 

Left  adjunct  of  w  w  (the  tense  or  a  modal)  (e.g.,  just  can’t) 
PUNDIT  handles  tense  (w)  as  regular  verb  class  with  vo  object. 


na 


Noun  phrase  +  Adjective  (as  object,  eg.,  paint  the  house  red) 
defined  as;  nstg,sa,lar 


#  namepart  Name  part  (all  parts  of  proper  name  preceding  surname) 

!  namestg  NAME  STrinG  (as  value  of  nvar) 

In  Sager,  defined  as  Iname  +  *n  +  rname,  where  Iname 
and  rname  are  "name"  specific,  like  titles,  "Jr."  etc. 

This  has  been  used  very  differently  in  PUNDIT,  to  contain 
various  special  kinds  of  nouns  from  "shapes"  component 
defined  as;  ’date;  *part_number;  lnamer;nq,{w_nq_number};  ‘time 

#  nasobjbe  Noun  phrase  +  AS  +  OBJect  of  BE,  e.g.,  "she  interpreted  it 

as  a  linguist" 

Not  yet  in  PUNDIT 

#  nd  Noun  phrase  +  Adverb  (as  object,  e.g.,  put  it  here) 

Not  yet  in  PUNDIT,  but  needs  to  be  added! 

#  neitherstg 

neither  string,  used  in  conjunction. 

PUNDIT  handles  conjunction  via  meta-rule 

nn  N  (indirect  object  noun  phrase)  +  Noun  phrase 

defined  as:  nstg,  nstg. 

nnn  Nouns  occurring  as  left  adjuncts  of  a  head  noun  (e.g., 

herring  gull  colony).  Used  recursive  definition 

defined  as:  {dnl},*n;  namestg;  *n,nnn;  namestg,nnn 

#  nofstg  nor  string,  for  conjunction 

PUNDIT  handles  conjunction  via  meta-rule 

#  notopt  optional  not 

"not"  treated  as  adverb  in  PUNDIT 


npn  Noun  phrase  +  prepositional  phrase  (as  object) 

defined  as:  n8tgo,pn 

npos  Noun  POSition  of  the  ordered  left  adjuncts  of  the  noun 
defined  as:  nnn;  null 

#  npsnwh  Noun  phrase  +  Preposition  +  snwh  (wh-string  as  a  Sentence 

Nominalization) 

PUNDIT  could  handle  wh-structures  via  meta-rule  but  doesn’t  yet 

#  npsvingo  Noun  phrase  +  Preposition  +  Subject  +  ving  (-ing  form  of 

verb)  +  Object 
Not  yet  in  PUNDIT 

#  npvingo  Noun  phrase  +  Prepositional  phrase  +  ving  (-ing  form  of 

verb)  +  Object 
Not  yet  in  PUNDIT 

#  npvingstg  Noun  phrase  +  Prepositional  phrase  +  vingstg  (either 

vingofn  or  nsvingo) 


PUNDIT  handles  vingstg  as  normal  noun  construct,  whose 
head  is  *ving  (see  nvar  definition) 

nq  Noun  phrase  +  Quantifier/letter  (e.g.,  the  Mark  2 

analyser,  the  Model  B  spectrophotometer) 
defined  as:  nqnvar,*q 

$  nqnvar  in  PUNDIT,  just  a  regular  noun 
defined  as  *n 

nsnwh  Noun  phrase  +  snwh  (wh-string  as  Sentence  Norainalisation) 
defined  as:  nstgo,  sa,  snwh. 

^  nspos  Possessive  Noun  of  type  position  of  the  ordered  left 
adjuncts  of  a  noun  (e.g.,  one  lost  children’s  bicycle) 

Not  currently  in  PUNDIT 

!  nstg  Noun  string 

Currently  PUNDIT  does  not  support  nwhstg  option  for,  e.g, 

"what  I  like  is  fish" 

defined  as:  {d_endmark},(lnr,{np_selection}; 

Ipror; 

nsvingo). 

$  nstg_frag  Noun-STrinG  FRAGment,  e.g,  "Bad  disk  drive." 
defined  as:  8a,lnr,aa,{w_bare_nstg}. 

nstgo  Noun  string  as  Object,  used  to  mark  objective  case  for  pronouns 
defined  as:  nstg. 

nstgt  Noun  string  of  Time,  e.g.,  "last  week" 
defined  as:  nstg. 

nsvingo  N’S  (possessive-case  noun  or  pronoun)  -f  VING  (-ing  form 
of  a  verb)  +  Object 

defined  as:  {d_nulLLnsr},vingo; 

{ d_Ir.3r }  .i.:3r,vingo,{  w_true_vingo} 

nthats  Noun  p  ,  ^e  +  THAT  +  assertion  (verb  object  option) 
dofi-  :  nstgo,sa,thatc 

^  ntobe  Noun  pi  '  f  TO  +  BE  -f  object  of  be 

PUNDIT  i)  8  not  require  distinction  between  ntobe  and  ntovo. 

ntovo  Noun  phrase  +  to  +  V  (infinitive)  +  Object  (verb  object  option) 
e.g.,  I  expected  them  to  go” 

in  PUNDIT,  this  is  distinguished  from  objtovo.  Ntovo  is 
for  objects  where  the  noun  is  NOT  also  an  object  of  the 
matrix  verb. 

defined  as:  subject, [to],vo. 

^  numbstg  Number  string 

Not  in  PUNDIT;  numbers  in  PUNDIT  handled  by  "shapes”  component. 


null 


Empty  String 

defined  as  ~  (symbol  for  the  empty  string) 


$  nulLaux  NULL  AUXiliary  verb  for  sero-copula  fragment, 
e.g,  "dbk  replaced"  =>  "disk  be  replaced" 

Used  to  mark  missing  auxiliary,  for  regularisation. 
defined  as: 

$  null_main  NULL  MAIN  verb  for  lero-copula  fragment, 
e.g.,  "dbk  bad"  =>  dbk  be  bad. 
defined  as:  ~ 

nulln  NULL  Noun,  used  to  mark  mbsing  head  noun  in  "the  three  were  here" 
defined  as:  ~ 

nullobj  NULL  Object,  used  for  intransitive  verbs,  as  in  "it  broke", 
defined  as: 

nullwh  NULL  WH,  used  to  mark  the  wh-gap  in  questions  and  relative  clauses 
defined  as:  ~ 

nvar  Noun  or  VARiant  —  options  of  head  for  Inr  construction. 

Includes  nouns,  names,  gerund  as  noun,  and  nulln  (the 

empty  noun  in  e.g.,  "the  three"). 

defined  as:  *n;  namestg;  *ving;  {dn2}, nulln,  (wnl). 

#  nvsa  Noun  +  Verb  Sentece  Adjunct  of  the  type:  "we  know"  in, 

e.g.,  "It  b,  we  know,  unusual." 

PUNDIT  does  not  yet  handle  thb,  but  will  need  to  spoken  input. 

#  nwhstg  Noun  position  WH-STrinGs  (e.g.,  What  he  cooks  tasts  good). 

Contrast  with  wh-complements,  i.e.,  sentence  nominalizations 
snwh,  e.g.,  What  he  cooks  depends  on  what’s  on  sale. 

PUNDIT  does  not  yet  handle  these,  but  they  can  easily  be 
to  the  wh-meta-rule  treatment. 

#  obes  Object  of  be  +  tensed  form  of  BE  +  Subject  of  be 

Used  for  permuted  sentence  constructions,  e.g.,  "Smart  are  they..." 
PUNDIT  does  not  yet  handle  thb  type  of  construction. 

objbe  Predicate  noun  phrase  or  adjective  phrase  of  pn  or  adverb 
defined  as:  astg;  nstg;  {d_of},  pn. 

$  objbe_frag 

OBJect  of  BE  as  FRAGment,  e.g,  "down  since  10/12". 
defined  as:  sa,objbe,{w_predicate},{w_endmark} 

#  objbesa  OBJBE  occurring  as  Sentence  Adjunct 

Not  in  PUNDIT. 

object  The  set  of  Object  strings  of  verbs  in  active  voice 

defined  as:  (npn;objtovo;pnthats;nthats;pnthatsvo;nn;na;pnn; 
dp2;dp3;dp2pn;dp3pn;nsnwh;dpsn;ntovo; 


pn;nstgo;astg;objbe;nthats,dplpn;dpl; 

thats;assertion;svo;clshould;sven;sobjbe; 

dstgfveno;be_aux;eqtovo;tovo;vo;nulIobj, 

{  wverbob  j } ,{ w_preobj_Ba} ) 

#  objectbe  OBJBE  +  verbal  objects  of  be 

PUNDIT  uses  be_aux  and  objbe  as  options  of  OBJECT  instead. 

$  objtovo  OBJect  TO  +  Verb  +  Object  construction. 

objtovo  is  distinct  from  ntovo  in  that  the  object  serves 

both  as  object  of  the  matrix  verb  and  subject  of  the  embedded 

clause. 

defined  as:  nstg,[to],vo 

#  ornot  OR  NOT,  terminating  yes_or_no  question, 
e.g,  "Are  you  coming  or  not?" 

Not  currently  handled  in  PUNDIT. 

or  string 

PUNDIT  handles  conjunction  via  meta-rule. 

Preposition  as  passive  object  (see  passobj; 
e.g.,  "They  can  be  relied  on". 

PUNDIT  uses  *p  instead  in  passive  object. 

Preposition  -h  Adjective  (e.g.,  at  last) 

Not  in  PUNDIT  at  the  moment. 

#  parenstg  Parenthesis  string 

PUNDIT  handles  as  option  of  appos. 

#  particularly-stg 

Particularly  string,  used  in  conjunction. 

PUNDIT  handles  conjunction  via  meta-rule. 

passobj  Object  strings  in  PASSive 

defined  as:  (nullobj',pn;thats;objbe;clshould;assertion; 

astg;dplpn;dpl;snwh;*p;pnthats;pnthatsvo,'dpsn; 

eqtovo;tovo;dplp),{wpassobj2}) 

#  pdate  Date  preposition  +  Date 

PUNDIT  handles  dates  by  "shapes"  component. 

#  permutation 

Permuted  forms  of  the  center  assertion  string 
Not  in  PUNDIT  yet. 

#  perunit  Per  +  unit  (per  hour,  per  cent) 

Not  in  PUNDIT. 

pn  Prepositional  phrase  (Preposition  +  Noun  phrase) 

defined  as:  lp,*p,n8tg,{w_pval} 


#  orstg 

#  pi 

#  pa 


$  pnpn  Repeated  prepositional  phrase 
defined  as:  pn,({d_of},pnj“) 

pnn  Prepositional  phrase  +  Noun  phrase  (permuted  form  of  npn)  in  object 

defined  as:  pn,nstgo. 

#  pnsnwh  Prepositional  phrase  +  snwh  (wh-string  as  Sentence 

nominalization) 

Not  in  PUNDIT  yet. 

pnthats  Prepositional  phrase  +  THATS  (that  +  assertion) 
defined  as:  pn,  sa,  thats. 

pnthatsvo  Prepositional  phrase  +  THAT  +  Subject  +  Verb  +  Object 
e.g.,  "I  asked  of  them  that  they  leave" 
defined  as:  pn,  sa,  clshould. 

#  pnvingstg  Prepositional  phrase  +  vingstg  (either  vingofn  or  nsvingo) 

PUNDIT  did  not  define  separate  vingstg-related  options, 
captures  this  as  nsvingo  or  *ving  in  nvar. 

Not  in  PUNDIT 

$  predicate  PREDICATE  fragment,  consisting  of  participle 
e.g,  "Replacing  disk." 
defined  as:  sa,  be.aux,  (w.endmark). 

#  psnwh  Preposition  +  snwh  (wh-string  as  Sentence  Nominalization) 

PUNDIT  does  not  yet  handle  snwh  constructions,  but  will 
once  wh  component  is  installed. 

#  pstg  A  subset  of  prepositional  object  strings  used  in  the  lexicon 

PUNDIT  does  not  group  these  options  together. 

#  psvingo  Preposition  +  SVINGO  (Subject  +  ving  (-ing  form  of  verb)  +  Object) 

PUNDIT  does  not  handle  this  now. 

#  pvingo  Preposition  +  vingo  (ving  +  Object) 

PUNDIT  will  handle  as  nsvingo  or  *ving  in  nvar. 

#  pvingstg  Preposition  +  VINGSTG  (either  vingofn  or  nsvingo) 

PUNDIT  will  handle  as  nsvingo  in  nvar  in  pn. 

#  pwhnq  Preposition  +  WH-containing  Noun  phrase  +  yes-no  Question 

(e.g.,  From  which  side  did  they  enter?) 

PUNDIT  can  handle  in  new  wh  meta-rule  treatment,  but  doesns’t  yet. 

#  pwhnq-pn  Preposition  -i-  WH-containing  Noun  phrase  -(-  yes-no  Question 

less  a  PN  (prepositional  phrase)  in  Object  (e.g..  To  whom 
is  it  attributed?) 

PUNDIT  could  handle  in  new  wh  meta-rule  treatment. 

#  pwhns  Preposition  -I-  wh-containing  Noun  phrase  -l-  assertion 

(e.g.,  "the  girl  from  whose  apartment  it  was  taken") 


PUNDIT  could  handle  in  new  wh  meta-rule  treatment. 

#  pwhns-pn  Preposition  -f-  WH-containing  Noun  phrase  +  assertion  less 

a  PN  in  object  (e.g.,  the  artist  to  whom  it  is  attributed) 

PUNDIT  could  handle  in  new  wh  meta-rule  treatment. 

#  pwhq  Preposition  WH-word  yes-no  Question 

(e.g.,  "For  whom  was  it  ordered?”) 

PUNDIT  could  handle  in  new  wh  meta-rule  treatment. 

#  pwhq-pn  Preposition  -f-  WH-word  -f-  yes-no  Question  less  a  PN  in 

Object  (e.g.,  On  what  is  it  based?) 

PUNDIT  could  handle  in  new  wh  meta-rule  treatment. 

#  pwhs  Preposition  +  wh-word  +  assertion 

#  pwhs-pn  Preposition  -I-  WH-word  -|-  assertion  less  a  PN  in  object 

PUNDIT  could  handle  in  new  wh  meta-rule  treatment. 

#  q-assert  Assertion  used  in  analysing  comparative 

PUNDIT  does  not  currently  handle  comparative. 

#  q-conj  Body  of  conjunction  string  following  a  coordinate. 

PUNDIT  handles  via  conjunction  meta-rule. 

#  q-invert  INVERTed  assertion  used  in  analysing  comparative 

PUNDIT  does  not  currently  handle  comparatives. 

qn  Quantifier  +  Noun  (where  Noun  =  name  of  a  unit:  "  a  3-inch  line" 

defined  as:  Iqr,  {d_sing},*n. 

#  qnrep  Repeated  qn  sequence  (4  lb.  2  os.) 

Not  in  PUNDIT  yet. 

Quantifier  possessive  NOUN  ("a  4  month’s  history  of  headaches") 
Not  in  PUNDIT  yet. 

q-word  (e.g.,  tens,  dosens.  lots,  hundreds  of) 

Used  in  parsing  numbers. 

Not  in  PUNDIT. 

!  qnpos  Position  of  the  qn  string  and  nq  string  in  the  ordered 
left  adjuncts  of  a  noun,  e.g.,  "a  two  ton  brick"; 
in  PUNDIT,  only  qn  allowed, 
defined  as:  qn 

#  q-phrase  ever,  usual,  necessary  in  comparative 

(e.g.,  "We  will  wait  as  long  as  usual.") 

PUNDII  does  not  handle  comparatives. 

qpos  Quantifier  Position  of  the  ordered  left  adjuncts  of  a  noun 
defined  as:  lqr;null. 


#  qn» 

#  q-of 


question  Question  as  center  string  of  a  sentence 
defined  as:  yesnoq;  wh_question. 

qyar  Quantifier  Variant  in  Iqr  definition,  including  q,  and  numbers 
numbers  handled  by  shapes  in  general, 
defined  as:  *q;  fraction_q. 

ra  Right  adjuncts  of  an  Adjective 

defined  as:  null; 
pnpn; 

{  d_r  aising_adj  }  ,tovo; 

{d_equi_adj},tovo; 

d_sent(thats;assertion). 

#  ral  enough  or  null  as  Right  adjunct  of  an  Adjective 
occurring  as  left  adjunct  of  a  noun 
Not  in  PUNDIT;  lari  has  no  right  adjunct. 

rd  Right  adjunct  of  an  Adverb,  e.g.,  "enough" 

defined  as:  null. 


#  rdate  Right  adjunct  of  Date 

PUNDIT  handles  dates  via  "shapes"  component. 

$  reLclause 

Takes  the  place  of  Sager’s  rnwh  options, 
defined  as:  whRC, assertion, {w_need_gap} 

!  rname  Right  adjunct  of  a  Name,  e.g.,  "Jr.",  "Ill" 
defined  as:  null  (for  now). 

#  rnp  Strings  beginning  with  a  Preposition  as  Right  adjuncts 

of  a  Noun  phrase 

PUNDIT  does  not  use  this  intermediate  node,  has  pnpn  option  instead 
!  rn*r  Right  adjuncts  of  a  Noun  phrase  (*r  indicates  adjunction 
is  repeatable) 

PUNDIT  does  not  support  repetition  except  via  pnpn  rule, 
defined  as:  {d_endmark},pnpnpringo; 

{dn_comp},  (thats;clshould;tovo:  subl; 

appo8;astg,(w_heavy_rn};rel_clause; 

iero_comp; 

null; 

venpa8s,{w_heavy_rn} 
for  wh,  also  defined  as: 

{d_endmark},pnpn; 

#  rnsubj  Right  adjuncts  of  a  Noun  SUBJect  at  a  distance  in  sa 

(e.g.,  A  procedure  is  described  which...) 

Not  yet  in  PUNDIT. 


#  rnwh  Relative  clause,  i.e.,  WH-string,  as  Right  adjunct  of  a  Noun 


This  functionality  is  captured  via  options  in  rn. 

$  rpro  Right  adjunct  of  PROnoun 

PUNDIT  distinguished  Ipror  from  Inr. 
defined  as:  null. 

!  rq  Right  adjunct  of  a  quantifier,  e.g.,  "enough”  or  empty, 
defined  as:  null 

#  rsubj  Roving  adjuncts  of  the  Subject  (or  a  more  proximate 

noun)  of  quantifier  type  (e.g.,  We  are  all  amazed). 

Not  in  PUNDIT. 

$  runon  RUNON  sentence  or  sentence  fragments 

defined  as:  a88ertion,center,{v8o_selection}; 
fragment, center 

!  rv*r  Right  adjuncts  of  a  Verb  (*r  indicates  adjunction  is  repeatable) 
Not  repeatable  in  PUNDIT;  Sager  also  allows  dstg,  pn,  qn,  sn. 
defined  as:  null. 

#  rw  Right  adjunct  of  w  (the  tense  or  a  modal)  (e.g..  He  is 

not  coming;  she  will  not  be  here) 

PUNDIT  does  not  distinguish  modals  from  regular  verbs. 

#  saconj  Sentence  Adjunct  following  a  coordinate  CONJunction 

Handled  as  simple  sa  in  PUNDIT. 

!  8a*r  Sentence  Adjuncts  (*r  indicates  adjunction  is  repeatable) 

Not  repeatable  in  PUNDIT;  also  fewer  options. 

Specifically,  the  options  for  time  nstg  (nstgt),  roving 
adjuncts  (rsubj,  rnsubj),  passive  (e.g,  "attacked  by  the  snakes”) 
and  comparatives  are  missing  in  PUNDIT, 
defined  as:  null; 

{d_endmark},commaopt,{dsa}, 

(  ((d_d_or_p},dstg);tovo;sub7;subl;sub0;{d_of},pn; 
({d_init_sa}  ,vingo)), 

{wmed_3a},commaopt,{w_comma_symmetry} 
for  wh,  aUo  defined  as: 

{ d_post_ob j } ,  { d_null wh_in_sa }  ,dstg. 

#  sasobjbe  Subject  +  AS  +  OBJect  of  BE 

Option  of  object,  e.g.,  "they  saw  this  as  their  opportunity” 

Not  in  PUNDIT  yet. 

#  sawh  WH-strings  in  the  set  of  Sentence  Adjuncts 

Will  eventually  be  handled  via  meta-rule. 

#  sawhichstg 

WHICH-STrinG  (relative  clause)  as  Sentence  Adjunct 
(e.g.,  "She  left,  which  surprised  him.”) 


#  sai 


Adjunct  of  a  Zeroed  sentence  under  conjunction 


(e.g.,  He  left,  and  fast.) 

Not  in  PUNDIT. 

#  scalestg  Scale  string  in  qn,  e.g.,  "two  feet  long” 

Not  in  PUNDIT 

!  sentence  Intro  lucer  +  center  +  endmark  in  Sager 
No  introducer  in  PUNDIT, 
defined  as:  center,  ([.];[?]). 

#  sn  Sentence  Nominalisation  option  of  subject, 

including  thats,  fortovo,  tovo  clshould  and  snwh 
Not  included  in  PUNDIT  yet. 

#  s-n  Assertion  less  one  Noun  phrase  (i.e.,  headless  relative  clause) 

Will  be  handled  by  meta-rule  in  PUNDIT  when  wh  is  installed. 

I  snwh  WH-string  as  a  Sentence  Nominalisation  (i.e.,  wh-complement) 
e.g.,  "whether  I  will  leave  is  unclear” 
defined  as:  (whQ,assertion);(whQ,tovo). 

sobjbe  Subject  4-  Object  of  BE  option  of  object 
e.g.,  "they  consider  them  fools", 
defined  as:  nstg,sa,objbe,sa. 

#  sobjbesa  Subject  +  OBJect  of  BE  occurring  as  Sentence  Adjunct 

Not  in  PUNDIT. 

#  stovo-n  Subject  +  TOVO-N  string  as  object  of  have 

e.g.,  "I  have  things  to  do" 

Not  yet  handled  in  PUNDIT 

subject  Subject  of  verb  in  the  same  string 
defined  as:  nstg;  there.def. 

!  subO  Subordinate  conjunction  +  Object  of  be 
e.g.,  "after  failing  the  test" 

defined  somewhat  more  broadly  in  PUNDIT,  including  Sager’s 
subO,  sub2,  subs,  sub4  definitions, 
defined  as:  *csO,venpass; 

*csO,vingo; 

*csO,objbe. 

subl  Subordinate  conjunction  +  assertion 
e.g.,  "because  they  are  leaving" 
defined  as:  *C8l, assertion. 

#  sub2  Subordinate  conjunction  or  as  or  than  +  venpass  (passive 

verb  with  its  passive  object) 

PUNDIT  captures  in  subO 

#  subs  Subordinate  conjunction  +  ving  -ing  form  of  verb)  +  Object 

PUNDIT  captures  in  subO 


#  8ub4  Subordinate  conjunction  -  ving  string  {either  vingofn  or  nsvingo) 

ving  string  handled  in  nstg  in  PUNDIT. 

#  8ub5  Subordinate  conjunction  +  svingo 

e.g.,  "despite  the  dbk  failing  the  test" 

Not  included  in  PUNDIT. 

#  8ub6  Subordinate  conjunction  +  sobjbe 

e.g.,  "with  them  out  sick" 

Not  handled  in  PUNDIT 

sub7  Subordinate  conjunction  +  sven 
e.g., "with  the  crisis  ended” 
defined  as:  *C85,sTen. 

#  subs  Subordinate  conjunction  (as)  +  inverted  Assertion 

Not  in  PUNDIT. 

#  sub9  Should  +  svo,  subjunctive  adjunct 

e.g.,  "should  she  accept,  she  can  start  tomorrow." 

Not  in  PUNDIT. 

sven  Subject  +  passive  verb  with  its  passive  object  (venpass) 
option  of  passive  object,  e.g.,  "I  got  the  disk  fixed" 
defined  as:  8ubject,8a,venpa88,sa. 

#  svingo  Subject  +  VTNG  (-ing  form  of  verb)  +  Object 

Option  of  object,  e.g,  "I  watched  them  running  the  race" 

Not  yet  in  PUNDIT. 

svo  Subject  +  Verb  (tenseless)  +  Object 

Option  of  object,  e.g.,  "I  let  them  go" 
defined  as:  subject, sa,lvr,sa, object, sa 

#  tense  Position  for  tense-word  (modal) 

PUNDIT  handles  modals  as  regular  verbs. 

#  thanstg  THAN  STrinG,  for  comparative  constructions. 

PUNDIT  has  no  treatment  of  comparatives 

thats  THAT  +  assertion  option  of  object 
e.g.,  "I  hope  that  they  come” 
defined  as:  [that],as8ertion. 

!  thats-n  THAT  +  assertion  less  one  Noun  phrase  (relative  clause 
with  word  that  instead  of  wh-word) 

PUNDIT  handles  via  meta-rule  component  for  wh. 

S  there_def  There  (pleonastic)  option  in  subject, 
defined  as:  [there]. 

!  title  A  Title  used  as  part  of  a  name  (e.g.,  Mr.,  Ms.)  in  namestg 
defined  as  as  atom,  option  of  Iname. 


#  tobe  TO  +  BE  as  tenseless  Verb  +  Object 

PUNDIT  covers  as  part  of  tovo  options. 

#  tostg  TO  string  (from  3  to  4  hours) 

PUNDIT  handles  conjunction  use  of  "to"  via  meta-rule, 

!  tovo  TO  +  tenseless  Verb  +  Object 

Option  of  verb  object,  e.g,  "She  seemed  to  win", 
tovo  in  PUNDIT  is  split  into  tovo  and  eqtovo,  to  distinguish 
equi  cases  ("I  hope  to  win")  from  the  raising  case, 
defined  as:  [to],vo. 

tovo-n  TO  +  tenseless  Verb  +  Object  less  one  Noun  phrase  in  object 
e.g.,  "the  person  to  see" 

PUNDIT  should  handle  this  via  a  meta-rule  consistent  with 
meta-rule  wh  treatment,  but  does  not  at  this  time. 

!  tpos  t  (The)  Position  of  left  adjuncts  of  noun  phrase 
Sager  uses  Itr  instead  of  just  *t.. 
defined  as:  Itr;  null. 

#  tsubjvo  Tense  Subject  +  tenseless  Verb  +  Object, 

e.g,, "Would  they  were  gone". 

Not  defined  in  PUNDIT. 

S  tvo  Tensed  Verb  +  Object  fragment 
e.g,  "fixed  the  disk" 
defined  as:  sa,  Itvr,  sa,  object,  sa. 

veno  VEN  (past  participle  of  a  verb)  +  Object  option  of  object, 
e.g.,  "They  had  seen  the  light", 
defined  as:  {dsel4},lvenr,sa,object,sa. 

venpass  VEN  (past  participle  of  a  verb)  +  Passive  object 
Option  of  object,  e.g,  "it  was  given  to  her" 
defined  as:  {dsel4},lvenr,{wpassobjl},sa,passobj,8a 

verb  tensed  or  tenseless  Verb  with  optional  left  and  right  adjuncts 
Replaced  by  Itvr  or  Ivr  in  PUNDIT. 

#  verbl  tense-word  or  tensed  be  or  have  in  question 

replaced  by  Itvr  in  PUNDIT. 

#  verb2  2nd  Verb  position  in  Question 

replaced  by  Ivr  in  object  in  PUNDIT. 

vingo  VING  (-ing  form  of  Verb)  +  Object 

Option  of  object,  e.g.,  "I  am  going  to  work" 
defined  as:  {dsel5},lvingr,8a,object,sa 

#  vingofn  VTNG  +  of  +  Noun  phrase 

PUNDIT  handles  as  regular  nstg. 


#  vingstg  VING  string  (nsvingo  or  vingofn) 

PUNDIT  has  separate  nsvingo  definition  in  nstg. 

#  vingstgpn  VTNG  string  +  pn  (presositional  phrase) 

PUNDIT  has  separate  nsvingo  definition  in  nstg. 

vo  tenseless  Verb  +  Object 

option  of  object,  e.g.,  "I  would  do  it", 
defined  as:  lvr,sa, object, sa 

!  war  Verb  VARiant 

In  Sager,  defined  as  tensed  or  tenseless  verb; 

In  PUNDIT,  used  for  empty  verbs  in  fragment  definitions, 
defined  as:  *v;{d_nullv},null_main;nuU_aux. 

S  wh  WH  word;  note  that  this  calls  nstg,  which  in  turn  calls  wh_word. 
defined  as:  where;  when;  why;  {d_wh},  nstg. 

$  whRC  WH  word  for  relative  clauses 
defined  as:  wh_word;  that. 

$  whQ  WH  word  for  questions 

defined  as:  {d_wh_how},dstg;  [what);  wh. 

S  wh_word  WH  word  within  noun  phrases 
defined  as:  [who];  [whom];  [which]. 

#  whats-n  WHAT  +  assertion  less  one  Noun  phrase 

In  PUNDIT, could  be  handled  as  part  of  meta-rule  treatment  for  wh. 

#  whens  WHEN  or  where  or  null  -f-  assertion 

(when  can  be  null  if  string  adjoins  time  noun) 

PUNDIT  handles  via  meta-rule  wh  treatment. 

#  wheres  WHERE  +  assertion 

PUNDIT  handles  via  meta-rule  wh  treatment. 

#  wheths  WHETHer  or  where  or  when  or  how  or  why  of  if  +  assertion 

+  optional  [or  not] 

Not  yet  incorporated,  could  be  part  of  wh  meta-rule  treatment. 

#  whethtovo  WHETHer  (or  other  wh-words)  -I-  TO  +  Verb  +  Object 

e.g.,  "whether  to  go  or  not" 

Not  yet  incorporated  in  PUNDIT,  could  be  handled  via  wh  meta-rules 

#  whevers-n  WH-EVER  (whose,  whenever,  whichever,  whatever)  +  assertion 

missing  a  Noun  phrase,  e.g.  "  whatever  they  wish" 

Not  yet  incorporated  in  PUNDIT,  could  be  handled  via  wh  i  ata-rules 

whin  wh-word  (whose,  which,  what,  how  string)  as  Left  adjunct  of  a  Noun 
option  of  tpos. 

defined  as:  which;  what;  howqastg. 


#  whn  Noun  phrase  or  vingofn  string  carrying  a  WH-word 

(e.g.,  whose  book  was  lost) 

Handled  via  whin  construction  and  wh-treatment 

#  whnq-n  WH-containing  Noun  phrase  +  yes-no  question  less  Noun 

(e.g.,  whose  book  have  you?) 

Handled  via  whin  construction  and  wh-treatment 

#  whns-n  WH-containing  Noun  phrase  assertion  less  one  Noun  phrase 

Handled  via  meta-rule  in  PUNDIT. 

!  whq  WH-word  -|-  yes-no  question 

Handled  via  meta-rule  in  PUNDIT. 

!  whq-n  WH-word  +  yes-no  question  or  assertion  less  one  Noun  phrase 
Handled  via  meta-rule  in  PUNDIT. 

!  whs-n  WH-word  -|-  assertion  less  one  Noun  phrase 
Handled  via  meta-rule  in  PUNDIT. 

yesnoq  yes-no  question  (e.g..  Have  you  a  book?.  Did  she  leave?) 
defined  as:  Ba,ltvr,sa,aubject,{w_8ai},sa,object,sa 

$  serocopula 

fragment  with  ZERO  COPULA,  e.g,  "disk  bad", 
defined  as:  sa,subject,8a,lvr,{w_frag_verb}, 

8a,object,{w_pn},{w_nonnull_ln},8a 

$  sero.comp  Zercv-complement  relative  clause  construction, 
option  of  rn,  as  in  "the  person  I  saw" 

defined  as:  subject,  {w_jero_comp},8a,  ltvr,{wagree},8a,object,8a. 


Lexical  Look-Up  Procedure 
in  PUNDIT 

Lynette  Hirschman 

Thia  document  describes  the  lexical  look-up  procedure  for  PUNDIT.  We  begin  with  a  brief 
description  of  the  lexicon  and  its  organisation.  We  then  provide  an  overview  of  the  functions  of 
the  lexical  look-up  procedure.  Finally,  we  describe  in  more  detail  the  specific  relations  used  to 
Implement  the  lexical  look-up  procedure.  Appendix  1  provides  a  detailed  description  of  the  for¬ 
mat  of  a  lexical  entry. 

1.1.  Organisation  of  the  Lexicon 

The  PUNDIT  lexicon  has  several  features  that  are  relevant  to  this  discussion. 

Entries  indexed  on  first  word 

Bach  lexical  entry  is  entered  into  the  (Prolog  recorded)  database,  indexed  on  the  first 
word.  Most  entries,  of  course,  have  only  one  word;  however,  for  multi-word  expressions 
(e.g.,  red  blood  eel/),  the  entry  is  indexed  only  on  the  first  word  (red  in  this  example). 

Form  of  entry  in  lexicon 

The  colon  (both  in  prefix  and  infix  forms)  is  used  as  a  functor  in  the  lexicon.  Each  entry 
in  the  lexicon  consists  of  the  WORD,  the  index  term,  the  root,  and  the  attribute  list.  The 
source  form  of  the  lexicon  looks  as  follows: 

:(WORD,  root:  ROOT,  ATTRIBUTE_LIST). 

#here  ATTRIBUTE_LIST  is  a  list  of  the  form: 

(LEXICAL.CLASS  :  ATTRIBUTES  !  MORE_ATTRIBUTES]. 

Idioms  (multi-word  expressions)  are  entered  by  use  of  the  circumflex  infix  operator  (‘), 
which  connects  the  words  in  the  multi-word  expression,  e.g., 

:(red*blood*cells,  root:  red*blood*cell,  [n:  [ncountl,  plural]]). 

The  colon  is  treated  as  a  regular  Prolog  relation;  code  associated  with  its  definition  causes 
the  source  entry  to  be  recorded  in  the  database,  indexed  on  the  word  (or  first  word,  in  a 
multi-word  expression),  e.g., 

recordi(red,  ;(red‘blood*cell3,  root:  red*blood*cell,  [n:  [ncountl,  plural]]). 

For  purposes  of  editing  and  displaying  lexical  terms,  each  word  is  also  cross-indexed  under 
its  root.  This  is  done  by  code  in  the  module  readin.pl. 

Compression  of  redundant  information 

The  PUNDIT  lexicon  enters  each  morphological  variant  as  a  separate  entry,  since  there  is 
(currently)  no  separate  morrhological  component.  As  a  result,  there  is  a  great  deal  of 
redundancy  between  morphologically  related  entries.  To  minimise  this  redundancy,  the 
lexicon  compresses  information,  storing  the  full  set  of  attributes  in  the  root  entry,  and 
using  pointers  to  this  information  in  the  morphological  variants.  This  means  that  at  lexi¬ 
cal  look-up  time,  the  look-up  procedure  must  "reconstitute"  entries  for  individual  words 
into  their  full  form.  This  process  is  described  in  some  detail  in  section  ??.  For  example,  the 
entry  for  the  word  "cells"  is  as  follows: 

:(cells,  root;  cell,  [n:  [plural,  11]]). 

In  this  entry,  11  is  the  pointer  to  the  attributes  associated  with  the  noun  entry.  (The  use 
of  numbers  as  pointers  is  an  historical  artifact,  based  on  the  representation  used  in  the 
Linguistic  String  Project;  it  could  and  probably  should  be  replaced  with  more  mnemonic 
pointer  labels,  such  as  noun^attributes,  verb_attributes,  etc.).  In  order  to  track  down  the 
information  represented  by  the  pointer  11,  the  look-up  procedure  goes  to  the  entry 
corresponding  to  the  root  (e.g.,  cell)  and  finds  there  a  specification  of  what  the  pointer  11 
stands  for.  By  convention,  the  pointer  definition  follows  (occurs  to  the  right  of)  its 


invocation  in  a  definition  (for  the  root  word).  I'or  a  non-root  word  (a  word  which  differs 
from  its  root),  the  definition  of  the  pointer  may  either  be  found  locally,  or  can  be  found 
associated  with  the  root  entry.  Thus  "cell"  is  a  root  word,  and  the  definition  for  "11"  is 
found  following  its  invocation: 

•.(cell,  root:  cell,  [n:  [singular,  11],  11:  [ncountl]]). 

Using  this  information,  the  entry  for  eelU  is  reconstituted  as: 
cells  :  [n  :[root:  [cell],  plural,  ncountl]]. 

This  is  the  form  returned  by  assembledefns,  for  ease  of  use  in  attaching  terminals  to  the 
parse  tree.  When  a  word  b  actually  attached,  only  the  particular  definition  corresponding 
to  that  terminal  b  attached  to  the  tree. 

Multiple  Entries 

A  single  word  may  have  multiple  entries  in  -he  lexicon.  Thb  can  reflect  incremental  addi¬ 
tions  to  the  lexicon,  or  it  can  reflect  differing  forms,  e.g.,  different  parts  of  speech,  as  in  the 
noun  train  vs.  the  verb  train;  it  can  result  from  genuine  homographs,  such  as  the  verb  can 
used  as  a  modal  (be  able)  or  as  a  transitive  verb  for  the  canning  process.  At  times,  it  can 
abo  reflect  an  error,  where  two  people  have  independently  entered  the  same  word  into  the 
lexicon.  In  any  case,  one  function  of  the  lexical  look-up  procedure  b  to  amalgamate  these 
entries  into  a  single  entry  for  purposes  of  parsing.  Where  two  entries  are  identical,  the 
program  b  smart  enought  to  simply  collapse  them.  In  other  cases,  the  union  of  the  attri¬ 
butes  b  recorded.  For  example,  suppose  the  entry  elotu  has  the  following  two  entries,  one 
for  the  adjective  and  the  one  for  the  verb; 

:(8low,  root:  slow,  [adj]).} 

•.(slow,  root;  slow,  [tv:  (...],  v:(  ...]]). 

During  lexical  look-up,  these  are  amalgamated  into  a  single  entry: 

;(slow  :[ 

adj:  [root:  [slow]], 

tv:  [root;  [slow],  plural,  objlbt:  [...],  ...], 
v;  [root:  [slow],  objlbt:  [...],  ...]]). 

If  a  word  has  two  identical  definitions,  the  redundant  information  b  suppressed.  However, 
if  two  not-quite-identical  definitions  are  given,  they  will  both  be  passed  along.  For  exam¬ 
ple,  if  the  source  lexicon  contains  the  following  two  entries: 

:(sugar,  root:  sugar,  n:  [singular,  mass]). 

:(sugar,  root:  sugar,  n;  [singular,  ncountl]). 

then  the  lexical  look-up  procedure  will  generate  the  following  entry  for  consumption  by  the 
parser: 

sugar:  [n:  [root:  [sugar],  singular,  mass], 

n:  [root:  [sugar],  singular,  ncountl]]. 

Shapes:  a  grammar  for  productive  forms 

The  last  bsue  concerns  the  problem  of  how  to  store  pro<luctive  forms  in  the  lexicon.  Thb 
arises,  for  example,  for  numbers,  dates,  times,  part  numbers,  etc.  The  solution  in  PUNDIT 
is  to  use  a  shapes  grammar  (in  shapes.pl),  which  parses  the  tokens  within  a  productive 
form,  identifies  the  class  (and  attributes)  of  the  lexical  entry  from  the  shape  of  its  tokens, 
and  assigns  it  a  definition  on  this  basis.  Definitions  derived  from  the  shapes  component 
are  then  added  to  the  Ibt  of  possible  definitions  for  a  word. 

Choosing  a  definition 

At  thb  pont„  definitions  sharing  the  same  root  have  been  merged  into  a  single  definition; 
however,  there  may  be  distinct  entries  due  to  dbtinct  roots,  or  due  to  idiom  look-up,  or 
due  to  use  of  the  shapes  component.  The  final  stage  is  to  chose  one  of  these  definitions  to 


pursue,  and  hand  off  the  remainder  of  the  word  stream  for  further  processing.  (In  a 
bottom-up  system,  it  would  be  possible  to  generate  a  lexical  lattice  at  this  point,  with  arcs 
spanning  one  or  more  nodes,  and  each  arc  associated  with  a  distinct  definition.)  For  now, 
the  choice  of  definition  done  by  "longest  first".  This  means  that  if,  for  example,  there  are 
entries  for  both  "sickle  cell"  and  "sickle  cell  anemia",  if  the  word  stream  matches  "sickle 
cell  anemia",  this  definition  will  be  chosen  in  preference  to  the  shorter  sequence  "sickle 
cell".  However,  this  choice  is  backtrackable,  so  that  if  no  parse  is  obtained,  the  system 
can  backtrack  to  this  point  and  try  a  shorter  (or  different)  expression.  In  general,  how¬ 
ever,  it  appears  to  be  the  case  that  if  a  parse  is  obtained  with  the  longer  definition,  it  is 
incorrect  (and  can  lead  to  spurious  ambiguities)  to  backtrack  and  obtain  multiple  parses. 
Therefore,  it  would  probably  be  appropriate  to  introduce  some  code  to  commit  to  this 
choice  in  case  a  parse  is  obtained. 

1.2.  The  Code  In  Lexical  Look-up 

This  section  documents  the  important  procedures  used  in  lexical  look-up.  The  comments 
reflect  the  current  state  of  the  code,  which  clearly  could  use  some  clean  up. 

as8enibledefna(-fInputWordStream,-DefinitionL!st,-RemalnlngWords) 

This  is  the  top-level  routine,  called  after  the  call  to  makeWordList  has  converted  tokens 
into  words  (code  in  reader.pl).  It  is  called  recursively,  consuming  one  lexical  unit  on  each  call. 
A  lexical  unit  is  a  single  word  or  a  multi-word  expression  that  starts  at  the  current  point  and 
spans  one  or  more  "words".  The  procedure  asaembledefns  has  the  following  steps: 

1.  Find  in  the  lexicon  ALL  ENTRIES  beginning  with  WORD 

(done  in  lookup/2) 

2.  Match  multi-word  expressions  beginning  with  WORD 

(done  in  possible.entries/S); 

this  creates  a  list  of  possible  sequences  matching  the  input  stream, 
together  with  a  notation  of  how  many  words  each  candidate  eats  up. 
creates  a  data  structure  e(Def,Num),  where  Num  is  number  of  words  -  1 
spanned  by  the  definition. 

3.  Get  all  the  roots  associated  with  each  candidate 

(done  in  allow_mult_roots/3); 

this  changes  the  "e(Def,Num)"  data  structure  to  ”e(Def Jloot,Num)". 

4.  Use  the  roots  to  decompress  the  definition  (the  "number  lists") 

(done  in  fill_in_def/2); 

this  creates  a  set  of  decompressed  possible  definitions; 
it  also  changes  the  "e"  data  structure  from 
e(Def,Root,Num)  to  e(RevisedDef,  Num), 
where  Def  =  :(Word,root:Root,Attributes) 
and  RevisedDef  =  :(Word,  LexClassList), 
where  each  element  of  LexClassList  = 

LexClass:  [root:[Rootj,  LexClassAtts]. 

5.  Merge  entries  for  a  given  word  and  same  root  into  a  single  entry 

(done  in  merge_entries/2); 

this  allows,  for  example,  creation  of  a  single  def.  given 
two  entries,  one  for  slow:  [adj],  and  one  for  slow:  [v,  tv]; 

6.  See  if  WORD  is  parsable  as  a  shape 

(done  in  alLshape_entrie8/2,  modify _3hape_entries/2); 
this  produces  an  additional  list  of  definitions, 
which  is  appended  to  existing  list; 

7.  Select  the  LONGEST  definition 


(done  by  choo8e_def/4)  **  this  is  a  backtrack  point  ** 

8.  Call  assembledefns/S  recursively  to  process  rest  of  word  stream. 


lookup(+ W opd,  -LlstOfDefsStarting WlthWord) 

The  procedure  lookup  consults  the  lexicon  for  all  entries  stored  under  Word  and  returns 
all  distinct  definitions  found  under  the  key  Word  that  start  with  word.  This  may  include 
multi-word  definitions  and  multiple  definitions  with  either  the  same  or  different  root  forms. 

po88ible_entrle8{-f-Ll8tOfDefBStartingWithWd,-|-ReinalnlngWd8,-Li8tOfMatchingDefB) 

This  procedure  takes  the  list  of  possible  definitions  generated  by  lookup  and  tries  to 
match  multi-word  expressions  against  the  input  stream.  It  generates  a  data  structure 
e(Def|Num),  where  Num  is  the  number  of  additional  words  consumed  from  the  input  stream. 
This  is  eventually  used  to  the  longest  multi-word  expression  from  competing  possible  alterna¬ 
tives. 

allow  _mult_root8(-t-Li8tOfMatchingDef8,-|-TempLi8t,-Revi8edEStructureLi8t) 

This  procedure  takes  the  output  of  possible_entries,  namely  LittOfMaichingDeft,  and  gen¬ 
erates  extra  entries  for  any  definition  that  is  not  its  own  root,  but  points  back  to  a  root 
definition  that  has  multiple  entries.  It  creates  a  list  element  for  each  entry  paired  with  a 
specific  root  definition.  It  also  revised  the  e  data  structure  to  have  the  form  e(Def, Root, Num). 

flll_iii_def(-(-Revi8cdEStructureLl8t,  -FilledlnDefLlst) 

This  procedure  handles  the  "decompression"  of  pointers  into  explicit  attribute  lists.  Its 
input  is  the  revised  "e"  structure  list  from  allow _inult_root8.  Its  output  b  a  differently  struc¬ 
tured  definition  Ibt,  with  pointers  replaced  by  attribute  Ibts.  The  output  Ibt  b  structured  for 
ease  of  use  in  parsing.  Thus  the  Ibt  consbts  of  the  word  or  words,  followed  by  the  Ibt  of  lexical 
classes.  Within  each  lexical  class,  we  find  the  root  and  the  remaining  attributes  associated  with 
that  lexical  class.  Thus  the  definition  Ibt  now  has  the  form; 

Word;  [Lex_classl:  [root:  [Rootl]  j  Lex_class_attJbtl], 

Lex_class2;  [root:  [Root2]  J  Lex_class_att_lbt2], 


The  procedure  fill_in_def  calls  on  fill_in_attrbs,  which  works  right  to  left  and  has  responsibility 
for  both  capturing  pointer  definitions  (and  collecting  them  for  use  in  resolving  pointer  refer¬ 
ences)  and  resolving  invocations  of  pointer  definitions,  either  by  looking  at  those  pointer 
definitions  already  captured,  or  by  finding  the  root,  and  capturing  the  definitions  from  the  root 
word. 

inerge_entrie8(-t-FilledInDefLi8t,-MergedDefX.ist) 

The  procedure  merge_entrle8  merges  all  definitions  consuming  the  same  number  of  words 
into  a  single  entry.  In  addition,  it  merges  entries  with  identical  roots  into  a  single  lexical-class 
entry.  For  example,  it  would  convert  the  following  input  to  a  single  entry,  first  by  combining  the 
two  entries  for  wordl*word2,  then  by  combining  the  attribute  lists  for  the  entries  with  identical 
roots. 

[e(wordl''word2;  [n:  [root:  [wordl"word2],  ncountl]],  1), 


e(wordl*'word2:  [n;  [root:  [wordl*'word2),  mass]],  l)] 

==> 

[e(wordl*word2:  [n:  [root:  [wordl*word2],  ncountl,  mass]]]. 

all_ahape_entrIea(+WordStresun,  -ShapeDefLtst) 

Thu  procedure  invokes  the  shapes  grammar  against  the  input  stream  and  produces  a  set 
of  possible  pairs  consisting  of  a  shape  length  and  its  definition.  The  shapes  grammar  is  defined 
in  shapes. pi  and  provides  entries  for  productive  forms,  such  as  numbers,  dates,  etc.  If  the 
shapes  grammar  produces  no  entries,  the  empty  list  is  returned. 

modify _shape_entriea{+WordNumShapeEntryPairs,  -ModifiedShapes) 

This  procedure  takes  as  input  a  list  of  pairs  of  the  form  ShapeLength- ShapeDef  ».nd  returns 
the  appropriate  "e"  structure  list,  so  that  the  shapes  definitions  can  be  merged  with  the  previ¬ 
ously  coUected  definitions. 

choose_def(-|-LlBtOfPo88ibllitie8,  -ChosenDef,  +Wd8Al*terStartWd,  -RemainingWda) 

The  procedure  ehoo8e_def  takes  as  its  input  the  merged  set  of  definitions  from  the  regular 
lexical  look-up  procedure  and  from  shapes  and  selects  the  definition  spanning  the  longer  number 
of  words.  It  also  creates  a  back-track  point,  so  that  the  remaining  definitions  can  be  explored 
via  backtracking  into  lexical  look-up  if  desired. 
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SPECIFICATION  OF  LEXICAL  ENTRIES  IN  PUNDIT 
Frangois  Lang 

This  is  cin  attempt  to  formalize  Pundit’s  lexical  entries,  which  I  have  coded  as  part  of  an 
error- checking  mechanism  to  be  added  to  the  readin.pl  file.  The  reason  for  doing  this  is 
that  there  is  currently  no  mechanism  for  ensuring  the  well-formedness  of  lexical  entries  which 
are  read  in.  In  fact,  as  I’ll  point  out,  there  are  a  disturbingly  large  ijumber  of  lexical  entries 
currently  in  some  lexicon  file  which,  for  one  reason  or  another,  are  bogus. 

All  terminology  set  in  slanted  font  is  defined  in  what  follows. 

The  following  must  be  true  of  a  Pundit  lexical  entry; 

1.  It  is  a  (syntactically  correct)  ground  Prolog  term. 

2.  Its  principal  functor/arity  is  :/3. 

3.  Its  first  argument  is  a  lexical  item. 

4.  Its  second  argument  is  a  term  of  the  form  root: Root,  where  Root  is  a  lexical  item. 

5.  Its  third  argument  is  a  definition  list. 

A  lexical  item  is  either  a  lexical  atom*  or  an  idiom. 

A  lexical  atom  is  one  of  the  following: 


1.  an  atom  containing  exactly  one  character  C  such  that  if  A  is  ASCII  equivalent  of  C,  the 
goal  singleCharacterWord(A)  succeeds.^ 

2.  an  atom  containing  only  the  following  characters: 

(a)  alphanumerics  (i.e.,  a  ...  z,  A  ...  Z,  and  1  ...  9), 

(b)  the  single-  and  double  quote  characters  and  and 

(c)  the  underscore  character 

'Throughout  this  specification,  I  have  tried  to  be  very  careful  to  distinguish  atoms  and  atomic  terms. 
The  distinction  is  that  numbers  are  atomic  terms,  but  not  atoms.  I.e.,  if  X  is  currently  instantiated  to  a 
number,  the  goal  atomic(X)  succeeds,  but  the  goal  atom(X)  does  not.  I  specify  here  that  lexical  items  are 
atoms,  and  not  atomic  terms,  because  numbers  are  now  analyzed  by  the  shapes  component,  and  have  been 
taken  out  of  the  lexicon. 

^The  predicate  singleCharacterword/l  is  defined  in  the  file  reader.pl. 
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An  idiom  is  a  term  of  the  form  X“Y  where  X  is  either  an  integer  or  a  lexical  atom,  and  Y  is 
either  an  integer,  a  lexical  atom,  or  itself  an  idiom.  E.g.,  starting'air'compressor  is  an 
idiom,  as  is  :  (cgn"(-)  “25, root  rbainbridge,  [proper]). 

A  definition  list  is  a  (possibly  empty)  list  of  definition  terms. 

A  definition  term  is  one  of  the  following; 

1.  A  category  definition, 

2.  A  pointer  definition, 

3.  A  lexical  category. 

A  category  definition  is  a  term  of  the  form  Cat  :FeatureList,  where 

1.  Cat  is  a  lexical  category,  and 

2.  FeatureList  is  a  feature  list. 

A  pointer  definition  is  a  term  of  the  form  Pointer: Definition,  where 

1.  Pointer  is  an  integer,  and 

2.  Definition  is  a  feature  list. 

A  lexical  category  is  a  term  C  such  that  the  goal  get_type(C,  atomic.node)  succeeds  (e.g., 
adj,  n,  p,  pro,  proper,  q,  v,  ven,  ving).  All  lexical  categories  are  atoms. 

The  Cat  and  Pointer  terms  appearing  as  tho  left-hand  arguments  of  :/2  in  category  defi¬ 
nitions  and  pointer  definitions,  respectively,  can  be  referred  to  as  definition  beads. 

There  are  a  few  additional  constraints: 

•  At  least  one  definition  term  in  a  non-empty  definition  list  must  be  either  a  category 
definition  or  a  lexical  category.  In  other  words,  it  is  incorrect  for  all  definition  terms  in 
a  definition  list  to  be  pointer  definitions.  E.g.,  the  following  list  is  not  a  valid  definition 
list; 

[12: [vendadj ,h-modal] ,3: [sasobjbe ,nstgo ,vingo] , 11 : [ncount 1 .nonhiiman]] 

•  A  pointer  definition  in  a  definition  list  must  appear  after  (i.e.,  to  the  right  of)  all 
references  to  it.  This  means  that  pointer  definitions  should  in  general  appear  after  all 
other  definition  terms  in  a  definition  list.  E.g.,  neither  of  the  following  lists  is  a  valid 
definition  list; 

[v: [12] ,12; [objlist: [nstgo]] ,tv: [12, plural]] 

[v: [13] ,tv; [13, plural] ,15: [pval: [for, to]] ,13: [objlist; [nstgo, npn: [15]]]] 


2 


A  feature  list  is  a  (possibly  empty)  list  of  feature  terms. 
A  feature  term  is  one  of  the  following; 


1.  A  term  of  the  form  Feature: Expansion,  where 

(a)  Feature  is  a  feature  bead,  and 

(b)  Expansion  is  a  feature  list. 

2.  A  feature. 


A  feature  head  is  one  of  the  following: 

1.  A  lexical  attribute 

2.  A  term  of  the  form  X-Y  (leftover  LSP  medical  categories),  where  X  and  Y  are  both 
atoms.  In  such  terms,  X  will  almost  always  be  the  atom  h.  A  complete  listing  of  all 
such  “hospital  terms”  currently  appearing  in  Pundit  lexicons  has  been  collected.  Note 
that  these  X-Y  terms  are  not  atoms,  contrary  to  popular  belief  and  expectations.  These 
X-Y  terms  are  totally  irrelevant  to  all  current  uses  of  Pundit,  and  could  (should)  be 
removed  from  our  lexicons.  Removing  them  would  simplify  this  formalization  of  lexicaJ 
entries. 

A  feature  is  one  of  the  following; 

1.  A  term  of  the  form  X-Y  as  above 

2.  An  idiom 

3.  An  atomic  term^ 


A  lexicaJ  attribute  is  any  one  of  a  well-defined  set  of  atoms  (such  cis  objlist,  pobjlist, 
pval,  and  dpval). 

As  mentioned  earlier,  there  are  currently  a  number  of  lexical  entries  which  do  not  meet  these 
criteria.  Some  of  them  have  problems  not  directly  related  to  their  form.  For  example,  many 
lexical  entries  assume  that  cs2,  cs3,  cs4,  cs5,  csS,  cs7,  cs8,  int,  punct,  and  w  are  lexical 
categories.  However,  since  none  of  these  atoms  appears  prefixed  by  in  the  body  of  a  BNF 
grammar  rule,  they  are  not  known  as  atomic  nodes,  and  thus  not  lexicaJ  categories.  If  such 

^We  cannot  restrict  features  which  are  not  idioms  and  not  “hospital  terms”  to  be  just  lexical  attributes, 
since  both  numerical  pointers  and  lexical  items  (prepositions,  for  example),  neither  of  which  which  are  lexical 
attributes,  regularly  appear  as  features  in  feature  lists,  as  in  n:  [singular,  11]  and  pn;  [pvsd;  [off  ,froa]]. 
It  might  be  possible  somehow  to  restrict  non-idiom  features  to  integers,  lexical  items,  and  lexical  attributes, 
but  this  will  require  more  thought.  For  now,  we  will  say  no  more  than  these  features  are  atomic. 
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an  unknown  lexical  category  is  encountered  while  reading  in  a  lexicon,  an  appropriate  and 
persipcuous  warning  message  should  be  issued,  but  reading  in  the  lexicon  should  be  allowed 
to  continue. 

In  addition,  there  are  the  following  entries  (at  least)  which  are  simply  bogus.  Again,  if  such 
cin  ill-formed  entry  is  encountered  while  reading  in  a  lexicon,  a  warning  message  should  be 
issued,  but  processing  should  continue.  For  most  of  these  entries,  it  is  left  as  an  exercise  to 
the  reader  to  determine  the  exact  problem! 

: (re-Gxamination, 
root : examination, 

[n: [11 .singular] ,11: [nonhuman, h-vmd.h-rep.h-record]]) 

: (re-examinations, 
root : examination, 

[n: [11, plural]]) 

: (seeking, 
root : seek, 

[ving: [12] .vveryving]) 

Hint:  vveryving  is  not  a  lexical  category. 

: (shear, 
root: shear, 

[v: [12] ,tv; [12, plural] , 

12: [objlist : [nstgo.nullobj , [dp] ,npn: [pval: [off .from]]]]]) 

: (sheared, 
root : shear, 

[tv: [12, past] ,ven: [14] , 

14: [l2,pobjlist: [nullobj , [dp] ,pn: [pval: [off .from]]]]] ) 

: (timing, 
root : time, 

[n, singular, ving: [12]]) 

:  (try, 
root :try , 

[n: [11, singular] ,v: [12] ,tv: [12, plural] , 

12: [objlist: [3] .notnobj : [l] .vendadj ,h-modal] , 

3: [sasobjbe,dp4: [15] ,dp2: [15] ,dp3: [15] ,nstgo,vingo,eqtovo,npn: [16] .nullobj]  , 
16 : [pval : [on]] , 1 : [ntimel ,ntime2] , 14 : [objlist : [3] .vendadj .pobjlist : [4]] , 

4: [asobjbe.dpl : [15] ,pn; [16] .nullobj] ,15: [dpval: [out]] , 

1 1 : [ncount 1 , nonhuman , h-_ 1882] ] ) 
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Hint:  Look  at  the  very  end  of  the  entry.  Also,  the  pointer  definitions  for  3  and  16  appear 
before  they  are  referenced. 

: (works , 
root ; work, 

[n, singular, tv: [12]]) 

: (nimitz,root,nimitz, [proper]) 
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The  Prolog  Structure  Editor  is  a  general  structure  editor  written  in  Prolog. 
It  is  intended  to  make  it  easy  to  edit  Prolog  terms  by  allowing  the  user  to 
edit  a  term  by  traversing  its  internal  structure.  As  used  in  the 
Natural  Language  group,  the  Prolog  Structure  Editor  allows  you  to  edit 
grammar  rules,  word  definitions  in  the  lexicon,  and  arbitrary  Prolog  clauses. 
You  may  invoke  the  editor  on  one  of  these  three  types  of  structures  by  using 
one  of  the  following  top-level  procedures: 

edit_rule(Key) 

edit_word(Word) 

edit_clau8e(Functor) 

The  edit_rule  procedure  takes  as  its  argument  the  name  of  a  non-terminal. 
It  then  allows  you  to  edit  all  of  the  grammar  rules  that  define  that  non¬ 
terminal.  In  order  to  maintain  consistency  between  the  grammar  rules  and 
their  translated  versions,  when  you  have  completed  editing  the  set  of  rules  the 
editor  will  ask  if  you  want  to  re-translate  them. 

The  edit^word  procedure  takes  as  its  argument  a  word  from  the  lexicon. 
It  then  allows  you  to  edit  the  definitions  of  that  word  and  all  of  its  morphologi¬ 
cal  variants. 

The  edit_clause  procedure  takes  as  its  argument  the  name  of  some  Pro¬ 
log  procedure.  The  editor  finds  all  clauses  with  that  head  and  returns  them  as 
a  set  of  clauses  to  be  edited.  While  this  option  is  only  of  limited  use  in  Quintus 
Prolog  (because  the  procedure  being  edited  must  be  declared  dynamic),  in  Sym¬ 
bolics  Prolog  it  will  allow  you  to  edit  any  Prolog  procedure. 

Once  you  have  called  one  of  the  three  procedures  that  invoke  the  editor, 
you  will  enter  the  top  level.  This  level  is  distinguished  from  lower  levels  in 
that  you  are  not  actually  editing  a  Prolog  term,  but  editing  a  set  of  terms.  At 
this  level  you  can  perform  operations  on  the  set  of  clauses  in  the  procedure 
like  retracting  an  old  clause,  or  asserting  a  new  clause.  ^ 


*  The(6  changei  to  th«  databsie  are  not  actually  recorded  until  the  editing  lesiion  it  hnithed. 


The  editor  will  -eport  at  every  level  what  kind  of  structure  you  are  editing. 
The  kinds  of  structures  that  the  editor  knows  about  are: 

A  Set  of  clauses  (top  level  only) 

A  List  of  terms 

A  Conjunction  of  terms  (actually  any  infix  right-associative  operator) 

A  Complex  Term  (A  functor  followed  by  some  number  of  arguments) 

An  Atom 

It  will  then  display  the  functor  of  the  term  (if  appropriate)  and  the  members  of 
the  term.  Following  are  some  examples: 

Editing  a  Set  of  Rules 

Rulel:  objectbe::=(astg;nstg;pn),{sem_rep(append)} 

Rule2:  objectbe::=(vingo;venpass),{sem_rep(copy)};{dwh2},nullwh 

Editing  a  term 
Functor:  ::= 

Argument  1:  objectbe 

Argument  2:  (vingo;venpas3),{sem_j‘ep(copy)};{dwh2},nullwh 

Editing  conjoined  terms 
Functor:  , 

Term  1;  {dwh2} 

Term  2;  nullwh 


There  are  two  types  of  commands  that  you  can  give  to  the  structure  edi¬ 
tor:  Movement  commands  and  Editing  commands.  At  every  level  in  the  edi¬ 
tor,  you  are  stationed  at  some  Prolog  term  (except  at  the  top  level,  when  you 
are  stationed  at  a  set  of  terms).  There  are  then  two  kinds  of  movement  com¬ 
mands:  downward  movement  and  upward  movement.  A  downward  move¬ 

ment  command  is  simply  an  integer  that  specifies  which  of  the  arguments  of  the 
current  term  you  wish  to  move  down  to,  from  1  to  N  (You  can  sometimes  move 
to  the  0th  item,  if  the  term  has  a  functor  then  it  is  considered  the  0th  argu¬ 
ment).  At  any  term,  only  one  direction  is  up,  so  the  command  ’u’  will  move  you 
up  one  level  in  the  structure.  For  convenience,  the  command  ’t’  (for  ’top’)  will 
move  you  to  the  top  level. 


An  editing  command  is  one  which  actually  modifies  the  structure  of  the 
term  that  you  are  stationed  at.  The  editing  commands  that  are  currently  sup¬ 
ported  are: 

delete 

Specified  by  ’d<integer>’.  This  command  deletes  the  named  term, 
insert- after 

Specified  by  ’i<integer>’.  This  command  inserts  a  new  term  after  the 
mentioned  term.  ’iO’  will  make  the  new  term  the  first  argument.  You  will 
be  prompted  for  the  term  that  is  to  be  inserted.  As  this  new  term  is  a  Pro¬ 
log  term,  you  will  have  to  end  your  input  with  a  period. 

replace 

Specified  by  ’r<integer>’.  This  command  replaces  the  specified  term  in 
the  place  of  the  mentioned  term.  You  can  sometimes  replace  the  0th  item, 
if  you  want  to  change  the  functor  of  some  complex  term. 

These  commands  are  also  available: 
downward-movement 

Specified  by  <integer>.  Moves  down  to  the  Nth  term  of  the  current  term, 
move- up 

Specified  by  ’u’.  Moves  up  to  the  term  that  contains  the  current  term. 
go- to- top 

specified  by ’t’.  Moves  to  the  top  level. 

abort 

Specified  by  ’a’.  Ends  the  editing  session  and  does  not  save  any  of  the 
changes  made! 

print 

Specified  by  ’p’.  This  command  prints  the  structure  of  the  current  term. 
This  conunand  should  only  be  used  at  the  end  of  a  command  line. 

help 

Specified  by  ’?’.  This  command  prints  out  a  listing  of  the  available  com¬ 
mands.  It  should  only  be  used  at  the  end  of  a  command  line. 


If  you  command  the  editor  to  insert  a  whole  rule,  word,  or  clause,  it  will 
print  out  an  entry  from  the  set  you  are  editing  (to  serve  as  a  "template"  of  the 
type  of  entry  you  want  to  create)  and  ask  you  to  edit  that  entry  to  form  the 
new  entry  that  you  want  to  insert.  (As  a  warning,  do  not  insert  a  structure  and 
then  delete  it.  Instead,  wait  until  the  end  of  the  editing  session,  and  answer 
"no"  to  the  query,  "do  you  want  to  add  ...?")  If  you  try  to  replace  a  rule  or 
clause,  the  editor  moves  to  that  rule  or  clause  and  asks  you  to  replace  each 
part  individually  (this  is  supposed  to  save  you  keystrokes). 


To  end  the  editing  session,  you  must  be  at  the  top-most  structure  (a  set  of 
rules,  a  set  of  words,  a  set  of  clauses).  At  that  level,  type  ’u’  (or  ’t’),  and  the 
editor  will  ask  you  if  you  want  to  save  the  changes  that  you  have  made. 


A  SAMPLE  EDITING  SESSION; 


I  ?-  edit_rule(objectbe). 

Editing  a  Set  of  Rules 

Rulel:  objectbe::=(astg;nstg;pn),{sem_rep (append)} 

Rule2:  objectbe::=(vingopvenpa8s),{sein_rep(copy)};{dwh2},nullwh 
Command:  2 

j*  edit  the  second  rule 

Editing  a  term 
Functor:  ::= 

Argument  1:  objectbe 

Argument  2:  (vingo;venpas8),{sem_rep(copy)};{dwh2},nullwh 
Command:  2 

/*  move  to  the  second  argument 

Editing  conjoined  terms 
Functor:  ; 

Term  1:  (vingoprenpaa8),{8em_rep(copy)} 

Term  2:  {dwh2},nullwh 
Command:  2  r2 

/*  replace  the  second  argument  of  the  second  term 

Replace  the  term:  nullwh 
with  what  Prolog  term:  stuff. 

Editing  conjoined  terms 
Functor:  , 

Term  1:  {dwh2} 

Term  2:  stuff 
Command:  t 

I  *  go  to  the  top  level 

Editing  a  Set  of  Rules 

Rulel:  objectbe::=(astg;nstg;pn),{sem_rep(append)} 

Rule2:  objectbe::=(vingo;venpass),{sem_rep{copy)};{dwh2}, stuff 
Command:  t 

f*  go  to  the  top  again,  i.e.,  finish  editing  this  rule 
('t’  and  ’u’  have  the  same  effect  at  this  level) 

Do  you  want  to  replace:  objectbe::=(vingo;venpass),{sem_rep(copy)};{dwh2},null 
wh 

with:  objectbe::=(vingo;venpa8s),{sem_rep(copy)};{dwh2}, stuff 
Enter  ’y’  ot 

If  you  have  changed  any  grammar  rules,  you  will  have  to  either: 

1.  Retranslate  this  rule. 

2.  Switch  the  grammar  to  run  interpreted  only. 

3.  Do  nothing  (and  risk  inconsistency!). 


Please  enter  1,  2,  or  3:  8. 


yes 


I  ?-  edit_word(replace). 

Editing  a  set  of  words  with  the  same  root 

Word  1:  :(replace, root :replace,[v:[l2l,tv:[12, plural],  12:[objlist:[nstgo,pn:[p 
val:[with]],npn:[pval:[with||]]]) 

Word  2:  :(replaces,root:replace,[tv;[l2, singular]]) 

Word  3:  :(replaced,root;replace,[tv:[l2,pa8t],ven:[l4],14:[l2,pobjlist:[nullob 
j,pn:[pval:[with]]]]]) 

Word  4:  ;(replacing,root:replace,[ving:[l2]]) 

Command:  8 

I*  edit  the  third  word 


Editing  a  term 
Functor:  : 

Argument  1:  replaced 
Argument  2:  root:replace 

Argument  3:  [tv:[l2,pa3t],ven:[l4],14:[l2,pobjli8t:[nullobj,pn:[pval:jwith]]]] 

1 

Command:  8 


I*  move  to  the  third  argument  (a  list) 


Editing  a  list 
Element  1:  tv:[l2,pastl 
Element  2:  ven:[l4] 

Element  3:  14:[l2,pobjli3t:[nullobj,pn:[pval:[with]]]] 

Command:  11 

/*  insert  an  element  into  the  list  after  the  first 
element  of  the  list 

What  Prolog  term  should  be  inserted:  stuff. 


Editing  a  list 
Element  1:  tv:[12,past] 

Element  2:  stuff 
Element  3:  ven:[14j 

Element  4:  14:[l2,pobjllst:[nullobj,pn:[pval;[withl]]] 
Command:  u 


/  *  go  up  one  level 


Editing  a  term 
Functor:  : 

Argument  1:  replaced 
Argument  2:  root:replace 

Argument  3:  [tv:[l2, past], stuff, ven:[l4],14:[12,pobjli8t:[nullobj,pn:[pval:[wi 

thill]] 

Command:  t 


Editing  a  set  of  words  with  the  same  root 

Word  1:  :(replace,root:replace,[v:[12l,tv;[l2,plural],12:[objlist:[nstgo,pn:[p 
val:[with]],npn:[pval:[with]]]]]) 

Word  2:  ;(repIaces,root:replace,[tv:[l2, singular]]) 

Word  3;  :(replaced, root  .-replace,  [tv:[l  2, past], stuff, ven:|l4],14:[12,pobjlist:[ 
nullobj,pn:[pval:[with]]]]]) 

Word  4:  :(replacing,root:replace,[ving:[l2]]) 

Command:  u 

Do  you  want  to  replace:  :(replaced,root:replace,[tv:[l2,pa8t],ven:[l4],14:[l2, 
pobjlist:[nullobj,pn:[pval:[with]]]]]). 

with:  :(replaced,root:replace,[tv:[12,past],stuff,ven:[14|,14;(l2,pobjlist;[nu 
llobj,pn:[pval:[with]]]]]). 

Enter  ’y’  or  ’n’:  n 

yes 


1  ?-  edit_word(contiol). 

Editing  a  set  of  words  with  the  same  root 

Word  1:  :(control,root:control,[n:[ll,singular],v:[l2],tv:(12,plural],ll;[nonh 

uman,h-change,h-norm],12;[objlist:[l],notnsubj:[2],vmanner,h-change,h-norm],l 

[nstgo,nsvingo,vingofn],2:[ntimel]]) 

Word  2:  :(controlled,root:control,[tv:[12,past],ven:(l4],14:[objlist;[l],notns 
ubj;(2],vmanner,pobjlist:[3],h-change,h-norm],3:[nuIlobj]]) 

Word  r.'  :(controlling,root:control,[ving:[12j]) 

Word  4:  ;(controls,root;control,[n:[l  1, plural], tv:[l2, singular]]) 

Command:  14 

/*  insert  a  word  after  the  fourth  word 

Here  is  a  word  of  the  type  that  you  want  to  create. 

Edit  it  to  make  the  new  word. 

Editing  a  term 
Functor:  : 

Argument  1:  control 
Argument  2:  root:control 

Argument  3:  [n:[ll, singular] , v:[l2), tv:(l2, plural], ll:[nonhuman,h-change,h-nor 
m],12:[objlist:[l],notnsubj:[2],vmanner,h-change,h-norm],l:[nstgo,nsvingo,ving 
ofn],2:[ntimel]] 

Command:  pi 

/*  replace  the  first  argument,  in  this  case, 
the  word  to  be  defined 

Replace  the  term:  control 

with  what  Prolog  term:  controller. 

Editing  a  term 
Functor:  : 

Argument  1:  controller 
Argument  2:  root:control 


Argument  3:  [n:[ll, singular], v:[12j, tv:[l2, plural], ll:[nonhuman,h-change,h-nor 

m],12:[objlist;[l],notn8ubj:[2],vmanner,h-change,h-norm],l:[n8tgo,nsvingo,ving 

ofn],2:[ntimel]] 

Command:  r8 


j*  replace  the  third  argument,  in  this  ease,  the 
definition  list 


Replace  the  term:  [n:[ll,8ingular],v:[l2],tv:[l2,plural],ll:[nonhuman,h-change 
,h-norm],12:[objlist:[l],notnsubj:[2],vmanner,h-change,h-norm],l:[nstgo,n8ving 
o,vingofn]  ,2 :  [ntime  1  ]] 

with  what  Prolog  term:  [nt[ll, singular], ll«[human]]. 


Editing  a  term 
Functor:  : 

Argument  1:  controller 

Argument  2:  root:control 

Argument  3:  [nj[ll, singular], llj(human]] 

Command:  t 


Editing  a  set  of  words  with  the  same  root 

Word  1:  :(control,root:control,[n:[ll, singular], v:[l2], tv:[l2, plural],! l:[nonh 

uman,h-change,h-norm],12:[objli3t:[l],notnsubj:[2],vmanner,h-change,h-norm],l 

[nstgo,nsvingo,vingofn],2:[ntimex]]) 

Word  2:  :(controlled,root:control,[tv:[l2,past],ven:[l4],14:[objlist:[l],notn3 
ubj:[2],vmanner,pobjliat:[3],h-change,h-norm],3:[nullobj]]) 

Word  3:  :(controlling,root:control,[ving:[l2]]) 

Word  4:  :(controls,root:control,[n:[ll, plural], tv:(!2,singular]]) 

Word  5:  :(controller,root:control,[ns(ll,singular],ll:[human]]) 

Command:  t 

Do  you  want  to  add  the  word:  :(controller,root:control,[n:[ll,singular],ll:[hu 
man]]]. 

Enter  ’y’  or  ’n’:  y 


yes 


System  Administration  for  Pundit  (SAP) 


File:  ”nlp/bin/SA_bin/README 

Author:  Korrinn  Fu 

Date:  3/23/89;  4/14/89;  4/17/89;  5/3/89 


I.  SAP  Overview 

This  is  our  new  tool  for  Pundit  system  administration.  We  named  it  "SAP",  which  stands  for 
System  Administration  for  Pundit.  SAP  is  an  interactive  tool  which  will  guide  you  through  the 
system  administration  process.  It  provides  menu  choices  for  each  step  of  the  process,  and  you 
will  no  longer  need  to  get  printouts  of  other  documentations  in  order  to  do  system  administra¬ 
tion. 

For  those  of  you  interested  in  seeing  further  documentation  on  the  system  administration  pro¬ 
cess,  look  into  the  README  files  in  ~nlp/NEWFILES/system_admini8tration/command_file8, 
and  in  its  subdirectories  NEWFILES_cf  and  pundit_cf.  Another  piece  of  useful  information  is  in 
“nlp/NEWFlLES  /system_administrator/checklist.  Some  of  the  information  is  outdated— we 
don’t  do  system  administration  on  the  vax  anymore.  However,  the  checklist  has  an  excellent 
overall  description  of  the  entire  system  administration  process  which  SAP  performs. 


n.  Software  architecture 

The  software  architecture  of  SAP  is  fully  illustrated  in  a  diagram— a  gremlin  figure  depicting 
the  flow  of  control  of  SAP  is  in  SA_structure.grn.  The  shellscripts  in  ”nlp/bin/SA_bin  are  also 
fully  documented  with  input/output  parameters. 

To  look  at  this  structure,  all  you  have  to  do  is  (in  suntools): 
gremlin  SA^tructure.grn 

or  print  it  out  on  the  imagen  just  like  how  you’d  print  other  gremlin  pictures.  To  print  this  file, 
all  you  have  to  do  is: 

1.  create  a  file  with  the  following  3  lines; 

.GS 

file  SA_structure.grn 
.GE 

2.  print  it  out  to  image: 

grn  <filename  created  in  l.>  J  ditroff  -me  -Pip<l  or  2> 


in.  How  to  use  SAP 

First,  make  sure  you  have  access  to  the  path  'nlp/bin  (in  your  .login  or  .cshrc),  or  you  would 
need  to  enter  the  entire  path. 

Second,  you  need  to  be  user  nip  to  execute  SAP.  To  do  this,  just  type  to  unix  prompt; 
su  nip 

and  enter  the  password  when  prompted. 


To  start  the  system  adminbtration  procedure  up,  type  at  the  unix  prompt: 


sap 

This  will  display  an  overall  picture  of  the  system  administration  process  and  a  menu.  The  menu 
choices  are: 

1.  make  NEWFILES  images 

2.  update  pundit  (update_pundit) 

3.  make  stable  images 

4.  clean  up 

5.  undo  and  redo 

Choice  1  allows  you  to  make  the  NEWF&iES  images.  Choice  2  activates  the  command 
update_pundit  to  move  new  files  to  the  stable  directory.  (There  is  a  man  page  for  this  com¬ 
mand.)  Choice  3  creates  the  stable  pundit  images.  Choice  4  allows  you  to  remove  files  that  are 
no  longer  needed,  and  archive  other  files  for  future  use.  Choice  5  is  a  menu  for  NEWFILES  and 
pundit,  it  allows  you  to  do  partial  restart  on  images.  You  can  choose  to  redo  NEWFILES 
images,  or  pundit  stable  images;  all  the  images  of  just  a  subset  of  the  images. 

Since  each  menu  choice  is  rather  self-explanatory.  I’ll  not  go  into  the  details  of  each  here. 
However,  a  brief  description  of  each  item  is  provided  in  section  IV. 

You  are  responsible  for  checking  the  image  results  after  NEWFILES  images,  and  after  stable 
images  are  created.  To  do  this,  look  at  the  ~nlp/NEWFILES/<domain>/<domain>_diff.test 
for  NEWFILES,  and  into  “nlp/pundit/<domain>/<domain>_diff.test  for  the  stable  images. 
This  provides  information  on  any  new  change  that  the  current  image  has,  over  the  previous 
image;  whether  the  current  result  Is  consistent  with  the  result  from  the  last  administration.  If 
there  are  differences,  you’d  need  to  check  with  the  author(a)  of  the  codes  to  see  if  the  difference 
is  intended. 


rV.  Files  important  to  the  administrator 

SAP  creates  a  number  of  log  and  err  files  during  it’s  tour  of  system  adminbtration.  The  files 
of  interested  for  an  adminbtrator  to  look  at  to  monitor  the  progress  b:  (in  ~nlp/bin/SA_bin) 

NEWFILES: 

NEWFILES.log 

NEWFILES_time.log 

NEWFILES.err 

pundit: 

pundit.log 

pundit_time.log 

pundit.err 

redo: 

redo.log 

redo_time.log 

redo.err 

The  <NEWFILES,  pundit,  redo>.log  files  telb  you  which  step  SAP  b  at  regarding  the  image 
making  and  testing  process.  These  are  the  messages  printed  by  SAP,  at  each  different  stage  of 
making/t;Et:ng  an  image. 

The  *time.Iog  files  telb  you  the  time  a  process  started/ended.  The  purpose  of  thb  file  b  to  keep 
a  time  stamp  on  each  step  of  making/testing  an  image.  Before  a  process  b  started,  the  file  gets 


a  time  stamp.  After  a  process  is  completed,  the  file  gets  another  time  stamp. 

The  *.err  files  are  the  diagnostic /error  messages  produc  ed  by  SAP.  By  looking  at  the  error  mes¬ 
sages  of  this  file,  you  can  tell  whether  there  were  any  p<'oblems  with  making/testing  the  images, 
and  what  kind  of  problems  they  were. 


V.  Main  menu  choices 

(1)  SAP  first  checks  if  there  is  enough  disk  space,  we  have  decided  that  h%  available  disk 
space  is  required  for  us  to  have  a  successful  image  making  round.  If  there  isn’t  enough 
space,  SAP  would  ask  if  you’d  like  to  see  a  list  of  images  under  ~nlp,  and  even  send  a 
request  to  delete  images  out  to  the  group  if  you’d  like. 

SAP  allows  a  user  to  make  all  the  images  (pundit,  casreps,  muck,  ships,  trident,  and 
opreps),  or  just  a  subset  of  the  domain  images.  If  only  a  subset  of  the  images  are  created, 
a  te8t_<domain>_<current  date>.log  file  is  created  from  the  previous  log.  This  is  to 
ensure  uniformity  so  the  next  time,  SAP’d  be  able  to  find  the  proper  log  files. 

(2)  SAP  provides  update_pundit  as  a  menu  choice  following  1.  so  you  don’t  have  to  find  out 
what  comes  after  making  NEWFILES  images. 

(3)  After  you  have  checked  the  NEWFILES  images,  you  will  go  on  to  make  the  stable  images. 
SAP  allows  a  user  to  make  all  the  images,  or  just  a  subset  of  domain  images,  just  like  that 
of  NEWFILES. 

(4)  After  you  checked  the  stable  images  and  they  are  correct,  SAP  cleans  out  the  NEWFILES, 
stable  and  SAP’s  directory.  This  step  will  take  more  time  as  SAP  prompts  you  for  permis¬ 
sion  to  "save"  or  "rm"  before  every  erasure.  This  is  the  last  step  of  the  whole  process. 

(5)  Each  time  after  creating  images  (steps  1  and  3),  you  might  find  mistakes  in  the  images, 
and  you’d  want  to  redo  the  incorrect  images.  SAP  allows  you  to  do  partial  restart  via 
menu  choices.  It  prompts  you  for  (y/n)  input,  allowing  you  to  redo  all  the  images,  or  just 
a  subset  of  the  domain(s). 


Control  Flow 
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